Permanent Chaos Engineering Jobs in the UK

14 of 14 Permanent Chaos Engineering Jobs in the UK

Global IT Quality Engineer Senior Director & CoE Lead

London, United Kingdom
Boston Consulting Group
of our DNA. To meet the needs of BCG's global, mobile, fast growing and increasingly diverse business, we are looking for a Global IT Senior Director for Quality Engineering role to lead and expand our central QA Center of Excellence (CoE) into an end-to-end QA Team. To execute this transformation, we need people who can translate … and expertise development for Quality Assurance and Performance Engineering. Among your responsibilities, you will: Lead End-to-End Quality Assurance: Lead the development and expansion of a centralized Quality Engineering (QE) Centre of Excellence (COE), ensuring that quality and performance standards are maintained across all platforms, products, including end-user environments. Implement best practices in quality metrics, reviews, and … end-to-end testing and manage structured QA cycles for security updates, patches, and system upgrades, ensuring comprehensive testing across third-party and custom-built applications. Establish Advanced Performance Engineering: Establish a robust performance engineering strategy, integrating advanced tools for application performance monitoring (APM), observability, and telemetry. Focus on early identification of performance bottlenecks and quality assurance measures More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Site Reliability Engineer

London, United Kingdom
Hybrid / WFH Options
ZILO
let's talk. About the Role We're looking for a Senior Site Reliability Engineer to join our SRE team. This is a hybrid role that blends deep platform engineering with application-level troubleshooting . You'll be responsible for the stability, performance, and resilience of our cloud-native infrastructure while also being on the front line when issues … strategies for microservices and core platforms Continuously monitor and improve system performance, cost-efficiency, and observability (LGTM stack/Datadog) Partner with security teams on compliance and vulnerability remediation ️ Chaos Engineering & Resilience Design and execute Chaos Engineering experiments. Develop and track SLOs, SLIs, and error budgets for critical systems Conduct resilience reviews and game days to … to backend service disruptions Investigate issues across infrastructure, Kubernetes, logs, traces, and service code Resolve incidents and support root causes (Java and GoLang services) Contribute to postmortems and reliability engineering initiatives Who You Are Essential Experience 5+ years in an SRE, DevOps, or infrastructure role Deep hands-on experience with AWS , EKS/Kubernetes , and Terraform Working knowledge of More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Site Reliability Engineer

London, South East, England, United Kingdom
Hybrid / WFH Options
Rise Technical Recruitment Limited
a strong culture rooted in integrity, creativity, and technical excellence, they've become a trusted partner across global industries.In this role you'll take ownership of platform reliability, resilience engineering, and incident management across cutting-edge cloud infrastructure. You'll play a key role in ensuring uptime, performance, and continuous improvement of core systems.The ideal candidate will be an … experienced Site Reliability Engineer with a deep background in AWS, Kubernetes (EKS), Terraform, and monitoring/eventing tools. You'll have a strong grasp of application-level troubleshooting, chaos engineering, and performance tuning.This is a fantastic opportunity to work in a modern DevOps environment where innovation is encouraged, personal development is supported, and technical impact is real. The … Role: *Manage and optimise AWS and Kubernetes (EKS) infrastructure*Implement resilience strategies and conduct chaos engineering experiments*Monitor and maintain Kafka clusters for performance and reliability*Respond to and resolve application-level production incidents The Person: *5+ years in SRE, DevOps, or infrastructure engineering*Strong experience with AWS, EKS/Kubernetes, and Terraform*Familiar with Kafka and More ❯
Employment Type: Full-Time
Salary: £80,000 - £90,000 per annum, Inc benefits
Posted:

Site Reliability Engineer

London, United Kingdom
Hybrid / WFH Options
Rise Technical Recruitment Limited
strong culture rooted in integrity, creativity, and technical excellence, they've become a trusted partner across global industries. In this role you'll take ownership of platform reliability, resilience engineering, and incident management across cutting-edge cloud infrastructure. You'll play a key role in ensuring uptime, performance, and continuous improvement of core systems. The ideal candidate will be … an experienced Site Reliability Engineer with a deep background in AWS, Kubernetes (EKS), Terraform, and monitoring/eventing tools. You'll have a strong grasp of application-level troubleshooting, chaos engineering, and performance tuning. This is a fantastic opportunity to work in a modern DevOps environment where innovation is encouraged, personal development is supported, and technical impact is … real. The Role: Manage and optimise AWS and Kubernetes (EKS) infrastructure Implement resilience strategies and conduct chaos engineering experiments Monitor and maintain Kafka clusters for performance and reliability Respond to and resolve application-level production incidents The Person: 5+ years in SRE, DevOps, or infrastructure engineering Strong experience with AWS, EKS/Kubernetes, and Terraform Familiar with More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Site Reliability Engineer

City of London, London, United Kingdom
Hybrid / WFH Options
Rise Technical Recruitment
strong culture rooted in integrity, creativity, and technical excellence, they've become a trusted partner across global industries. In this role you'll take ownership of platform reliability, resilience engineering, and incident management across cutting-edge cloud infrastructure. You'll play a key role in ensuring uptime, performance, and continuous improvement of core systems. The ideal candidate will be … an experienced Site Reliability Engineer with a deep background in AWS, Kubernetes (EKS), Terraform, and monitoring/eventing tools. You'll have a strong grasp of application-level troubleshooting, chaos engineering, and performance tuning. This is a fantastic opportunity to work in a modern DevOps environment where innovation is encouraged, personal development is supported, and technical impact is … real. The Role: *Manage and optimise AWS and Kubernetes (EKS) infrastructure *Implement resilience strategies and conduct chaos engineering experiments *Monitor and maintain Kafka clusters for performance and reliability *Respond to and resolve application-level production incidents The Person: *5+ years in SRE, DevOps, or infrastructure engineering *Strong experience with AWS, EKS/Kubernetes, and Terraform *Familiar with More ❯
Employment Type: Permanent
Salary: £80000 - £90000/annum 38 Days Holiday, Healthcare, Pension
Posted:

Engineering Lead - Public Cloud Engineering Practices - SVP

London, United Kingdom
Hybrid / WFH Options
Citigroup Inc
About the Opportunity Are you a seasoned technology leader with a passion for building cutting-edge enterprise products and a hands-on approach to engineering? Join Citi's Cloud Technology Services (CTS) team and be part of our commitment to transform Citi technology leveraging game-changing Cloud capabilities to drive agility, efficiency, and innovation. We're providing our businesses … with a competitive edge by leveraging public cloud scale and enabling new infrastructure economics. As the Public Cloud Engineering Practices Lead , you will play a pivotal role in shaping and executing our public cloud strategy. You will be part of a team that continues to deliver big! From building cloud base High Performance Compute (HPC) platform to run huge … GenAI at scale, all the way to enabling payments solutions, this team is at the forefront of innovation. What You'll Do: Lead the Charge: Own the public cloud engineering practices strategy and its execution, enabling Citi's secure and enterprise-scale adoption of public cloud. You will provide technical authority for all engineering practices across all public More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Sr. Cloud Operations Delivery Manager (CODM), Enterprise Support - UKI

London, United Kingdom
Amazon
ability to make high-judgment technical decisions in complex environments - Experience leading cross-functional teams with a mix of technical, business, and operational roles PREFERRED QUALIFICATIONS - Experience with resilience engineering, chaos engineering, and observability practices in AWS - Understanding of enterprise IT operational capabilities - examples include Change, Incident Management, infrastructure management or applications management - Knowledge of the AWS More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Senior DevOps Engineer

London, United Kingdom
Hybrid / WFH Options
Elliptic Enterprises Ltd
Senior DevOps Engineer Department: Engineering Employment Type: Full Time Location: London, UK Description The impact you will have: You will have a transformative impact across Elliptic by evangelising DevOps, security, and reliability principles and fostering a culture of efficiency and autonomy. You will join a growing team of experienced and passionate engineers who are not afraid to fail and … enjoy tackling difficult problems head-on. Openness is one of our core values at Elliptic, and nowhere is this more evident than in our engineering teams. We strongly encourage engineers to challenge convention and find unique and innovative solutions to our customers' problems. Key Responsibilities What you will do: Provide senior DevOps expertise and leadership across Engineering at … all layers of the stack Evangelise DevOps, security and reliability engineering across the Engineering team-at-large Provision resilient infrastructure across multiple regions and AZs Build compliant, reliable and featureful developer platforms centered on container orchestration. Enable Continuous Delivery and Deployment capabilities using CICD pipelines and GitOps tooling Enable shifting left on security and testing, and facilitate progressive More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Lead Site Reliability Engineer

London, United Kingdom
Lloyds Banking Group
ll focus on security, compliance, and continuous improvement to deliver resilient and high-performing systems. What you'll do The team focuses on Site Reliability, Platform, DevOps, and Systems engineering, to build and run large-scale, distributed, fault-tolerant systems on public cloud. This is a hybrid role, first leading a team, mentoring, coaching, and developing peers across the … debugging all services that run within the K8s ecosystem, including Istio service mesh SRE mentality (SLI, SLO & SLA) using Observability, Logging, Monitoring & Alerting (Dynatrace) Ideally coming from a software engineering or exceptional scripting skill background and have moved into SRE/DevOps while gaining a wider understanding of application ecosystems. Experience programming in at least two (but not all … following languages: Java, Groovy, Scala, Python, Go, C++, JavaScript, .Net, PowerShell or Bash/Shell. Knowledge of GCP and Azure cloud platforms. Strong expertise in DevOps tools Experience with Chaos Engineering, Day-2 Ops, Resiliency and Disaster Recovery Planning and execution Technical architecture and Microservice design principles. About working for us Our ambition is to be the leading More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Principal Engineer Gateway Services (Global)

London, United Kingdom
Us Bank
generation omni-commerce Gateway. We are currently hiring a Principle/Distinguished Engineer to support teams within this domain. In this role you will lead highly technical and strategic engineering initiatives on mission-critical platforms across our team, enabling every engineer to their best work. Your role will be tasked with solving the most complex, challenging technical problems across … this team to meet our demanding needs. You will play an influential role in partnership with engineering leadership group and other cross-divisional VPs of Engineering, owning technical vision and direction as well as Developer Experience. In order to excel in this role you will possess: Great communication skills. Ability to influence across teams and with senior stakeholders. … to speed on the latest and greatest happenings within technology. Strong appreciation of Event Storming and DDD having applied these mythologies in shaping microservices architectures. Experience in creating/engineering Cloud Native Architectures. Additional Experience (nice to have). Some experience with Model Context Protocol/AI having had some experience in how this can shape the future of More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Sr. Software Development Engineer in Test, Blink

London, United Kingdom
Amazon
who will shape the future of our AI-powered automation platform, with a particular focus on modernizing our application testing and deployment pipelines. The ideal candidate will combine deep engineering expertise with strategic thinking to create intelligent, scalable solutions that transform how we approach automation, dramatically reducing the time and complexity of application validation and delivery. This role requires … LLM-based approaches to test script generation, automated debugging, and intelligent test maintenance across our distributed systems Pioneer innovative quality practices that leverage AI for automated performance analysis, intelligent chaos engineering scenarios, and predictive system reliability testing Design self-healing test systems that use machine learning to adapt to application changes, automatically maintain test suites, and provide AI … focused on building solutions that scale across teams, accelerate our testing cycles, and ultimately enable us to deliver higher quality products faster than ever before. About the team Our Engineering Environment At Blink, you'll work within a fully integrated engineering ecosystem where you can test across multiple layers - from algorithms and ASICs to hardware, firmware, AWS infrastructure More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Software Development Engineer Test III, Prime Video Commerce

London, United Kingdom
Amazon
Software Development Engineer Test (SDET) to join our journey of technical excellence and customer obsession.As an SDET on our team, you'll blend your software development expertise with quality engineering to create robust test automation frameworks that ensure our subscription services remain reliable, scalable, and performant. You'll work in a collaborative environment where quality is everyone's responsibility … and your contributions will directly impact the viewing experience of customers around the globe. We're at an exciting inflection point in our quality engineering evolution. With numerous opportunities to innovate in our testing approach, you'll have the chance to architect new automation solutions, implement comprehensive test strategies, and drive efficiency improvements that help us deliver features with … Key job responsibilities - Lead the design and evolution of our automated test framework, ensuring it can handle the complex requirements of our expanding product portfolio - Collaborate closely with software engineering and quality assurance teams to identify opportunities to improve test coverage and optimize testing workflows - Develop advanced test automation tools and integrations to accelerate software development velocity and enhance More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Technology Resilience Manager

London, United Kingdom
Innovation Group
in technology operations, who is looking to broaden their skillset. After developing your specialist skills you are now looking for opportunities to grow and learn more about wider resilience, chaos engineering and cloud services - we will support, provide guidance and mentor you. Nevertheless, we are open to other experiences as we are creating a new diverse and dynamic More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Site Reliability Engineer

London, United Kingdom
Global Processing Services
shape our SRE strategy, establish best practices, and set the standard for service reliability and performance. What You'll Do Define strategies for Application Performance Monitoring, Unit Cost, and Chaos Engineering. Continuously optimize production environments to enhance reliability and efficiency. Implement and apply MTTR, SLO, and SLI principles to ensure high service standards. Respond to incidents, analyze root causes … layers that drive our platform's success. What You Need Proven experience implementing SRE principles at scale, including deep knowledge of SLI/SLO/SLA differences. A product engineering background with strong coding skills in Python, C#, or similar. Experience with incident management frameworks and evolving them for efficiency. Expertise in cloud platforms (AWS preferred) and container orchestration More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:
Chaos Engineering
10th Percentile
£84,250
25th Percentile
£103,750
Median
£107,500
75th Percentile
£140,000
90th Percentile
£153,500