Site Reliability Engineering Jobs in East London

2 of 2 Site Reliability Engineering Jobs in East London

Senior Director - Operations and Reliability Engineering

Canary Wharf, Greater London, UK
Boston Consulting Group
thrive. What You'll Do The Senior Director – Operations and Reliability Engineering is responsible for blending Site Reliability Engineering (SRE), DevOps, and traditional operations models to build a next-generation Reliability Engineering function. This role ensures end-to-end automation at scale, 24x7 … ensuring compliance with standardized frameworks and operational excellence. Key Responsibilities: Strategic Leadership & Transformation: * Define and execute a modern Reliability Engineering strategy, integrating SRE, DevOps, and automation-first operational models. * Drive end-to-end automation to eliminate toil, improve efficiency, and enhance operational resilience. * Lead the transition from traditional … Operational Excellence: * Mandate and assure the adoption of IT Service Management (ITSM) processes across all teams, ensuring standardized, efficient, and effective service delivery. * Establish SRE-based operational metrics, including SLOs, SLIs, and error budgets. * Oversee incident response, problem resolution, and root cause analysis with AI-driven remediation. * Ensure high availability More ❯
Employment Type: Full-time
Posted:

Site Reliability Engineer – FinTech / Global Payments – London HQ / Remote First

East London, London, United Kingdom
Hybrid / WFH Options
Future Talent Group
Site Reliability Engineer – FinTech/Global Payments – London HQ/Remote First Salary - £80,000/£85,000 + Bonus Location - This UK-based team offers a fully remote working option, with a headquarters in Central London. In this role, you will be joining a leading SaaS FinTech … market. The business aims to scale its platform significantly over the next few years to support a growing international client base. Responsibilities Champion core SRE practices: define SLIs/SLOs/SLAs, reduce toil through automation, and plan for Disaster Recovery. Refine KPIs to support data-driven decisions around reliability … teams to build resilient, observable, and maintainable features. Promote DevOps culture by leading knowledge-sharing sessions and supporting issue resolution. Skills Strong grounding in SRE principles and operational best practices. Proficient with observability tools (Prometheus, Grafana, OTEL, Cloudwatch) and telemetry pipelines. Solid programming skills in Python and/or Go More ❯
Posted: