10 of 10 Permanent Chaos Engineering Jobs in the UK

Principal/Senior Site Reliability Engineer

Hiring Organisation: Jobleads-UK
Location: Welwyn, England, United Kingdom

Design for resilience, building disaster recovery and failover plans with auto‐scaling and load balancing to keep critical systems available worldwide. Strengthen reliability through chaos engineering experiments that validate systems and surface weaknesses before incidents. Build deep observability with monitoring, logging, and alerting frameworks such as Prometheus, Grafana … equivalent experience in software and site reliability engineering. Preferred experience with distributed ML frameworks such as Horovod or TensorFlow Distributed, familiarity with data engineering pipelines such as Apache Airflow or Spark, and knowledge of chaos engineering tools and compliance frameworks such as GDPR ...

Principal/Senior Site Reliability Engineer

Hiring Organisation: Jobleads-UK
Location: City of Westminster, England, United Kingdom

organisation to design resilient, cloud-based systems for MLOps and HPC workloads at global scale. This is a role for someone who wants their engineering craft to have real impact on science and patients. The Opportunity: You architect Infrastructure as Code using Terraform, Pulumi, or CloudFormation to provision … resilience building disaster recovery and failover plans with auto-scaling and load balancing to keep critical systems available worldwide. You strengthen reliability through chaos engineering running experiments that validate systems and surface weaknesses before they become incidents. You build deep observability with monitoring, logging, and alerting frameworks such ...

Staff Software Engineer

Hiring Organisation: Jobleads-UK
Location: Belfast, Northern Ireland, United Kingdom

developer experience. The Harness Software Delivery Platform includes modules for CI, CD, Cloud Cost Management, Feature Flags, Service Reliability Management, Security Testing Orchestration, Chaos Engineering, Software Engineering Insights, and continues to expand at an incredibly fast pace. About The Role Design, develop and maintain critical software … equivalent professional experience. Joining the SCS team means you’ll work on a high‐impact product at the heart of modern DevOps, solving complex engineering challenges at scale. You'll be surrounded by passionate technologists, move fast, and see your work directly improve the lives of developers worldwide ...

Architect & Delivery Lead (68018)

Hiring Organisation: Jobleads-UK
Location: Greater London, England, United Kingdom

Make architectural decisions across IAM, Cloud, SRE, Network, Data, and Security — ensuring coherence, reusability, and alignment with business objectives Establish and chair the Architecture & Engineering Governance board, providing technical assurance across all workstreams Own the programme roadmap, resource plan, and financial model — tracking cost savings, team reduction trajectory … vendor and tool selection, ensuring standardisation across the programme and eliminating redundant tooling Build and lead high‐performing distributed teams, fostering a culture of engineering excellence, accountability, and continuous improvement Define the continuous improvement factory model, ensuring the transformation sustains beyond the initial programme Technical Skills & Expertise Broad ...

Advisory and Solution Architecture - Executive Director

Hiring Organisation: Jobleads-UK
Location: Greater London, England, United Kingdom

defining and governing end-to-end solution architectures for critical platforms and transformation programs. You will partner closely with CTOs, Chief Architects, and senior engineering leaders to deliver business and technology strategy. Your role blends deep architectural expertise with executive presence, enabling you to influence both business and technology … senior stakeholders and provide hands‐on guidance to engineering teams as needed. You will help accelerate delivery, embed security and resiliency, and foster a high-performing architecture community. Job responsibilities Translates business strategy into pragmatic solution roadmaps and reference architectures Drives architecture decisions balancing resilience, scalability, latency, cost, risk ...

Sr. Software Engineer - Data Platform (London, Hybrid)

Hiring Organisation: Jobleads-UK
Location: Greater London, England, United Kingdom

worked with all these technologies – we value strong distributed systems fundamentals, the ability to write production-quality code, and the passion for solving complex engineering challenges. We'll support your growth in streaming technologies and expect you'll be comfortable collaborating with teams distributed across various geographies and time … centers Develop RESTful APIs and SDKs that enable other teams to easily integrate with streaming platform capabilities Work closely with platform consumers, SREs, and engineering teams across the organization to understand requirements and deliver scalable solutions Challenge the status quo by continuously improving platform performance, reliability, scalability, and developer ...

Lead DevOps Engineer

Hiring Organisation: Jobleads-UK
Location: Greater London, England, United Kingdom

Registries for traffic and event routing between microservices and autonomous agents via CNCF Gateway APIs. Champion DevSecOps maturity by embedding SAST/DAST, chaos engineering, and error budget monitoring. Collaborate with Security, Data, and AI teams to shape DevOps and AI platform architectures with regulatory compliance. Stay ahead … several of the areas below, we encourage you to apply even if you don’t meet every detail. Core Expertise Experience leading or mentoring engineering teams with the ability to set direction and contribute hands‐on. Strong Kubernetes knowledge, including cluster lifecycle management, API extensions, Operators, Helm ...

DevOps Engineer

Hiring Organisation: Oscar Associates (UK) Limited
Location: Manchester, North West, United Kingdom
Employment Type: Permanent
Salary: £70,000

company's AWS platform, ensuring it's secure, scalable, reliable and cost-efficient as it moves into full production. Working closely with engineering teams, you'll drive automation, improve deployment pipelines, strengthen observability and ensure the platform performs under high-volume, real-time workloads. This is a hands … Infrastructure as Code CI/CD automation High-availability production environments Monitoring and observability Cloud cost optimisation and governance Desirable Multi-region AWS environments Chaos Engineering or resilience testing Cloud security tooling and posture management Why Apply? Modern AWS platform with no legacy infrastructure No Kubernetes administration High ...

Staff Software Engineer, AI Reliability Engineering

Hiring Organisation: Jobleads-UK
Location: England, United Kingdom

Role Claude has your back. AIRE has Claude's. Help us keep Claude reliable for everyone who depends on it. AIRE (AI Reliability Engineering) partners with teams across Anthropic to improve reliability across our most critical serving paths -- every hop from the SDK through our network, API layers, serving … TPUs, Trainium). Understand ML-specific networking optimizations like RDMA and InfiniBand. Have expertise in AI-specific observability tools and frameworks. Have experience with chaos engineering and systematic resilience testing. Have contributed to open-source infrastructure or ML tooling. The annual compensation range for this role is listed ...

Strategic Initiatives Program Lead – Senior Vice President

Hiring Organisation: Jobleads-UK
Location: Belfast City District, Northern Ireland, United Kingdom

Deep understanding of application architecture, distributed systems, and cloud technologies. Knowledge of disaster recovery, business continuity, and operational resilience frameworks. Familiarity with SRE principles, chaos engineering, and automated recovery practices. Understanding of regulatory requirements for operational resilience in financial services. Experience with enterprise platforms, APIs, and system integration … strategies. Experience managing program budgets and resources. BS degree in Computer Science, Engineering, or equivalent field required. Leadership Competencies: Strategic thinking and ability to translate business objectives into technology strategies. Strong executive presence with ability to influence and drive change across organizational boundaries. Exceptional communication skills with ability ...