Site Reliability Engineer

Lead Site Reliability Engineer

Job Type: Contract

Location: London, UK

Key Responsibilities:

The person will be responsible for as a Technical Authority (SME) for both Azure and Terraform, guide teams on SRE practices, approve production changes.
This role is platform-focused, not application-specific, and requires deep expertise in SRE principles, Azure Landing Zones (Hub-and-Spoke), Terraform, DevOps enablement, monitoring/observability, and incident management.
He should be involved in address long-term reliability and operational risks while building and mentoring SRE teams.
Design, implement, and operate Azure Hub-and-Spoke Landing Zone architectures.
Reduce operational toil through automation and platform improvements.
Own and evangelize SRE principles including availability, reliability, scalability, resilience, and operational maturity.
Define Terraform best practices, state management, drift detection, and CI/CD integration.
Build and maintain CI/CD foundations using GitHub Actions.
Design and standardize monitoring and observability across the Azure platform.
Lead and participate in major incident management following ITIL processes.
Partner with security teams to implement least-privilege access and secure-by-default architectures.
Enforce governance using Azure Policy and standardized platform controls.
Lead, mentor, and grow a high-performing SRE/platform engineering team.
Drive SRE culture across the organization and set technical direction, standards, and operational maturity goals.
Clearly explain and apply SRE concepts (SLIs, SLOs, error budgets, toil reduction, blameless postmortems).
Define and track platform-level SLIs/SLOs and ensure alignment with business objectives.
Strong hands-on knowledge of RBAC and IAM in Azure, Managed Identities, Azure Key Vault for secrets, keys, and certificates

Apply Now

Site Reliability Engineer

Job Details