9 of 9 Permanent Chaos Engineering Jobs in the UK

Site Reliability Engineer

Hiring Organisation
Jobleads-UK
Location
Greater London, England, United Kingdom
Hybrid Mandatory primary skills on Datadog/Dynatrace tools, SLO management skills (AWS cloud skills is secondary). Primary Responsibilities: • Work closely with Product Engineering team and implement strategies for modernizing IT operations enhancing observability and toil reduction. • Architect and deploy observability platforms to monitor system health, performance … scalable, resilient, and maintainable. • Drive incident management and root cause analysis processes through automation, ensuring continuous improvement to enable autonomous operations. • Partner with engineering, architecture, and product teams to enable shift-left engineering practices ensuring reliability. • Mentor and guide teams on adopting SRE principles and tools. • Advocate ...

Sr Service Reliability Engineer – Kings Cross, London

Hiring Organisation
Jobleads-UK
Location
Greater London, England, United Kingdom
operations. This is a senior technical role that requires a strategic mindset, deep-seated expertise in System Reliability Engineering. By blending a software engineering mindset with operational expertise, you will engineer solutions that improve system reliability, automate complex processes, and reduce manual toil. You will not only resolve … call rotation to troubleshoot and mitigate production incidents.* - Lead post-incident reviews and root cause analyses to implement lasting solutions.* - Partner with engineering and IT stakeholders to embed SRE best practices (SLOs, error budgets) into the design and development lifecycle.* Act as the Final Escalation Point for SRE operations ...

Strategic Initiatives Program Manager – Vice President

Hiring Organisation
Jobleads-UK
Location
Belfast, Northern Ireland, United Kingdom
Leverage cloud-native services and features to enhance application resiliency. This includes services for auto‐scaling, load balancing, and disaster recovery. Explore and implement chaos engineering practices to proactively identify and address system weaknesses under stress. Partner with IO owners and platform teams to expand OTR capabilities across … development of resiliency dashboards and self‐service reporting capabilities to provide transparency into program progress and application resiliency posture. Key Qualifications Experience in software engineering, site reliability engineering (SRE), or technology risk and controls. Experience in a program or project management role, delivering complex, cross‐functional technology initiatives. ...

Strategic Initiatives Program Manager – Vice President

Hiring Organisation
Jobleads-UK
Location
Belfast, Northern Ireland, United Kingdom
Leverage cloud-native services and features to enhance application resiliency. This includes services for auto-scaling, load balancing, and disaster recovery. Explore and implement chaos engineering practices to proactively identify and address system weaknesses under stress. Partner with IO owners and platform teams to expand OTR capabilities across … development of resiliency dashboards and self-service reporting capabilities to provide transparency into program progress and application resiliency posture. Key Qualifications Experience in software engineering, site reliability engineering (SRE), or technology risk and controls. Experience in a program or project management role, delivering complex, cross-functional technology initiatives. ...

Strategic Initiatives Program Manager – Vice President

Hiring Organisation
Jobleads-UK
Location
Belfast, Northern Ireland, United Kingdom
Leverage cloud-native services and features to enhance application resiliency. This includes services for auto-scaling, load balancing, and disaster recovery.* Explore and implement chaos engineering practices to proactively identify and address system weaknesses under stress.* Partner with IO owners and platform teams to expand OTR capabilities across … development of resiliency dashboards and self-service reporting capabilities to provide transparency into program progress and application resiliency posture.**Key Qualifications:*** Experience in software engineering, site reliability engineering (SRE), or technology risk and controls.* Experience in a program or project management role, delivering complex, cross-functional technology initiatives. ...

Senior Software Engineer / SRE - Electronic Trading

Hiring Organisation
Jobleads-UK
Location
Greater London, England, United Kingdom
Senior Software Engineer/SRE - Electronic Trading Location London Business Area Engineering and CTO Ref # 10050148 Description & Requirements About Observability Engineering Senior Software Engineers - SRE in Electronic Trading (ET) ensure our global enterprise products spanning fixed income, equities, and derivatives are resilient and observable. This role focuses … specialize in proactive anomaly detection, providing advanced performance insights and best practice guidance. Our team collaborates with application developers to define meaningful SLOs, implement chaos engineering, and build diagnostic tools that mitigate architectural risks as our platforms scale. What’s in it for you? You will have ...

Staff Software Engineer, AI Reliability Engineering

Hiring Organisation
Jobleads-UK
Location
England, United Kingdom
Role Claude has your back. AIRE has Claude's. Help us keep Claude reliable for everyone who depends on it. AIRE (AI Reliability Engineering) partners with teams across Anthropic to improve reliability across our most critical serving paths -- every hop from the SDK through our network, API layers, serving … TPUs, Trainium). Understand ML-specific networking optimizations like RDMA and InfiniBand. Have expertise in AI-specific observability tools and frameworks. Have experience with chaos engineering and systematic resilience testing. Have contributed to open-source infrastructure or ML tooling. The annual compensation range for this role is listed ...

Software Development Engineer in Test

Hiring Organisation
scrumconnect ltd
Location
Swansea, West Glamorgan, United Kingdom
Employment Type
Permanent
Salary
GBP 40,000 - 45,000 Annual
squad of civil servants and supplier staff. In this role, you will design, build, maintain, and continuously improve the automated test frameworks and quality-engineering practices that underpin high-volume, citizen-facing digital services handling billions of interactions annually. You will own the full testing life cycle from strategic … Level Test Automation Engineer certification. Advanced Testing Principles: Experience with contract testing (eg, Pact) for microservices and API ecosystems, as well as familiarity with chaos engineering and site reliability testing principles. Tooling Proficiency: Advanced skills in test management tooling such as Jira, Xray, or Zephyr. Government Assessments: Prior ...

Manager – Site Reliability Engineering

Hiring Organisation
Jobleads-UK
Location
Greater London, England, United Kingdom
place where everyone can grow, develop and fulfil your potential with meaningful careers.** ## Role profile:We are looking for a Manager – Site Reliability Engineering to strengthen the Production Management leadership team of Clearing Technology Service. This role demands a proactive and hands-on leader with deep technical expertise … capacity.* Deep technical expertise in Oracle database - troubleshooting, scalability, performance tuning and optimization.* Demonstrated experience implementing SRE frameworks – including SLOs, SLIs, incident management, and chaos engineering.* Experience leading teams supporting systems deployed across mixed infrastructure (Cloud and On-Premise, AWS preferred)* Solid understanding of change management, risk posture ...