8 of 8 Chaos Engineering Jobs in the UK

Site Reliability Engineer

Hiring Organisation: Jobleads-UK
Location: Greater London, England, United Kingdom

Hybrid Mandatory primary skills on Datadog/Dynatrace tools, SLO management skills (AWS cloud skills is secondary). Primary Responsibilities: • Work closely with Product Engineering team and implement strategies for modernizing IT operations enhancing observability and toil reduction. • Architect and deploy observability platforms to monitor system health, performance … scalable, resilient, and maintainable. • Drive incident management and root cause analysis processes through automation, ensuring continuous improvement to enable autonomous operations. • Partner with engineering, architecture, and product teams to enable shift-left engineering practices ensuring reliability. • Mentor and guide teams on adopting SRE principles and tools. • Advocate ...

Sr Service Reliability Engineer – Kings Cross, London

Hiring Organisation: Jobleads-UK
Location: Greater London, England, United Kingdom

operations. This is a senior technical role that requires a strategic mindset, deep-seated expertise in System Reliability Engineering. By blending a software engineering mindset with operational expertise, you will engineer solutions that improve system reliability, automate complex processes, and reduce manual toil. You will not only resolve … call rotation to troubleshoot and mitigate production incidents.* - Lead post-incident reviews and root cause analyses to implement lasting solutions.* - Partner with engineering and IT stakeholders to embed SRE best practices (SLOs, error budgets) into the design and development lifecycle.* Act as the Final Escalation Point for SRE operations ...

Senior Software Engineer / SRE - Electronic Trading

Hiring Organisation: Jobleads-UK
Location: Greater London, England, United Kingdom

Senior Software Engineer/SRE - Electronic Trading Location London Business Area Engineering and CTO Ref # 10050148 Description & Requirements About Observability Engineering Senior Software Engineers - SRE in Electronic Trading (ET) ensure our global enterprise products spanning fixed income, equities, and derivatives are resilient and observable. This role focuses … specialize in proactive anomaly detection, providing advanced performance insights and best practice guidance. Our team collaborates with application developers to define meaningful SLOs, implement chaos engineering, and build diagnostic tools that mitigate architectural risks as our platforms scale. What’s in it for you? You will have ...

Product Owner - Operational Resilience

Hiring Organisation: TEKsystems
Location: Sheffield, Yorkshire, United Kingdom
Employment Type: Contract
Contract Rate: GBP Annual

backlog Resilience-by-design - Embed resilience enhancements into SDLC and change processes (non-functional requirements, release readiness, operational acceptance). - Champion practices such as chaos engineering, game days, fault injection, capacity and performance testing, and DR readiness. Observability & insights - Partner with monitoring/observability teams to improve telemetry … identify systemic risks, recurring failure modes, and top offenders across services. Automation & operational excellence - Prioritise automation for detection, triage, and remediation. Stakeholder management - Align engineering, operations, architecture, risk, and business stakeholders on resilience priorities. - Communicate progress and risk clearly to snr leadership; manage dependencies and delivery risks. Governance & controls ...

Staff Software Engineer, AI Reliability Engineering

Hiring Organisation: Jobleads-UK
Location: England, United Kingdom

Role Claude has your back. AIRE has Claude's. Help us keep Claude reliable for everyone who depends on it. AIRE (AI Reliability Engineering) partners with teams across Anthropic to improve reliability across our most critical serving paths -- every hop from the SDK through our network, API layers, serving … TPUs, Trainium). Understand ML-specific networking optimizations like RDMA and InfiniBand. Have expertise in AI-specific observability tools and frameworks. Have experience with chaos engineering and systematic resilience testing. Have contributed to open-source infrastructure or ML tooling. The annual compensation range for this role is listed ...

Strategic Initiatives Program Lead – Senior Vice President

Hiring Organisation: Jobleads-UK
Location: Belfast, Northern Ireland, United Kingdom

Deep understanding of application architecture, distributed systems, and cloud technologies. Knowledge of disaster recovery, business continuity, and operational resilience frameworks. Familiarity with SRE principles, chaos engineering, and automated recovery practices. Understanding of regulatory requirements for operational resilience in financial services. Experience with enterprise platforms, APIs, and system integration … strategies. Experience managing program budgets and resources. BS degree in Computer Science, Engineering, or equivalent field required. Leadership Competencies Strategic thinking and ability to translate business objectives into technology strategies. Strong executive presence with ability to influence and drive change across organizational boundaries. Exceptional communication skills with ability ...

Manager – Site Reliability Engineering

Hiring Organisation: Jobleads-UK
Location: Greater London, England, United Kingdom

place where everyone can grow, develop and fulfil your potential with meaningful careers.** ## Role profile:We are looking for a Manager – Site Reliability Engineering to strengthen the Production Management leadership team of Clearing Technology Service. This role demands a proactive and hands-on leader with deep technical expertise … capacity.* Deep technical expertise in Oracle database - troubleshooting, scalability, performance tuning and optimization.* Demonstrated experience implementing SRE frameworks – including SLOs, SLIs, incident management, and chaos engineering.* Experience leading teams supporting systems deployed across mixed infrastructure (Cloud and On-Premise, AWS preferred)* Solid understanding of change management, risk posture ...

Principal Engineer - CPTO, BPL

Hiring Organisation: Jobleads-UK
Location: Greater London, England, United Kingdom

Principal Software Engineer, Platform, BPL As an Engineering Lead in our Chief Product and Technology Office (CPTO) , you’ll shape the architecture of tomorrow and mentor talented engineers, crafting elegant solutions for complex problems to create solutions that millions depend on. In this role … Driven Excellence - Apply domain modelling principles to create clean, maintainable codebases that accurately represent complex business logic. Your work will set the standard for engineering excellence. Production-Grade Quality - Champion comprehensive testing strategies—from unit tests to performance testing to chaos engineering. You'll ensure every service ...