4 of 4 Permanent Chaos Engineering Jobs in London

Site Reliability Engineer

Hiring Organisation
Jobleads-UK
Location
Greater London, England, United Kingdom
Hybrid Mandatory primary skills on Datadog/Dynatrace tools, SLO management skills (AWS cloud skills is secondary). Primary Responsibilities: • Work closely with Product Engineering team and implement strategies for modernizing IT operations enhancing observability and toil reduction. • Architect and deploy observability platforms to monitor system health, performance … scalable, resilient, and maintainable. • Drive incident management and root cause analysis processes through automation, ensuring continuous improvement to enable autonomous operations. • Partner with engineering, architecture, and product teams to enable shift-left engineering practices ensuring reliability. • Mentor and guide teams on adopting SRE principles and tools. • Advocate ...

Sr Service Reliability Engineer – Kings Cross, London

Hiring Organisation
Jobleads-UK
Location
Greater London, England, United Kingdom
operations. This is a senior technical role that requires a strategic mindset, deep-seated expertise in System Reliability Engineering. By blending a software engineering mindset with operational expertise, you will engineer solutions that improve system reliability, automate complex processes, and reduce manual toil. You will not only resolve … call rotation to troubleshoot and mitigate production incidents.* - Lead post-incident reviews and root cause analyses to implement lasting solutions.* - Partner with engineering and IT stakeholders to embed SRE best practices (SLOs, error budgets) into the design and development lifecycle.* Act as the Final Escalation Point for SRE operations ...

Senior Software Engineer / SRE - Electronic Trading

Hiring Organisation
Jobleads-UK
Location
Greater London, England, United Kingdom
Senior Software Engineer/SRE - Electronic Trading Location London Business Area Engineering and CTO Ref # 10050148 Description & Requirements About Observability Engineering Senior Software Engineers - SRE in Electronic Trading (ET) ensure our global enterprise products spanning fixed income, equities, and derivatives are resilient and observable. This role focuses … specialize in proactive anomaly detection, providing advanced performance insights and best practice guidance. Our team collaborates with application developers to define meaningful SLOs, implement chaos engineering, and build diagnostic tools that mitigate architectural risks as our platforms scale. What’s in it for you? You will have ...

Manager – Site Reliability Engineering

Hiring Organisation
Jobleads-UK
Location
Greater London, England, United Kingdom
place where everyone can grow, develop and fulfil your potential with meaningful careers.** ## Role profile:We are looking for a Manager – Site Reliability Engineering to strengthen the Production Management leadership team of Clearing Technology Service. This role demands a proactive and hands-on leader with deep technical expertise … capacity.* Deep technical expertise in Oracle database - troubleshooting, scalability, performance tuning and optimization.* Demonstrated experience implementing SRE frameworks – including SLOs, SLIs, incident management, and chaos engineering.* Experience leading teams supporting systems deployed across mixed infrastructure (Cloud and On-Premise, AWS preferred)* Solid understanding of change management, risk posture ...