Lead SRE

Role: Lead Site Reliability Engineer (Technical Lead)

Salary: £80,000 – £100,000

Location: London (Hybrid – 2 days per month)

We are working with a mission-led technology organisation that is continuing to scale a fully cloud-native platform as part of a major initiative. As they move away from traditional data centres, they are investing heavily in building a highly reliable, scalable and observable cloud platform.

As a Lead SRE, you will act as a technical leader within the reliability function, setting direction and driving best practices across engineering teams. This is still a hands-on role, but with added ownership around shaping strategy, influencing architecture and mentoring engineers.

You will be solving complex engineering problems across distributed systems, while helping define how reliability is embedded across the wider platform as it continues to scale.

Key Responsibilities

Leading the design and evolution of monitoring and observability systems
Defining and driving SLOs, SLIs and error budgets across teams
Owning incident management processes, post-mortems and continuous improvement
Partnering with engineering teams to design resilient, fault-tolerant systems
Driving automation across infrastructure, deployments and operational workflows
Contributing to capacity planning, performance optimisation and cost efficiency
Providing technical leadership through design reviews and architectural decisions
Mentoring engineers and influencing reliability best practices across the organisation

Tech Environment

GCP and AWS
Kubernetes and containerised workloads
Terraform and Infrastructure as Code
Prometheus, Grafana, Datadog and modern observability tooling
CI/CD pipelines and automation tooling
Python, Go or similar scripting languages
Distributed systems at scale

About You

Strong background in SRE, DevOps or Platform Engineering at a senior or lead level
Proven experience leading technical direction or mentoring engineers
Experience running and supporting production systems at scale
Strong understanding of observability, monitoring and reliability principles
Hands-on experience with cloud infrastructure and Kubernetes
Experience with Infrastructure as Code (Terraform or similar)
Comfortable debugging complex systems across infrastructure and application layers
Passionate about automation, reliability and improving engineering standards

This is a great opportunity to step into a technical leadership role, combining hands-on engineering with the chance to shape how reliability is delivered across a modern, cloud-native platform with real-world impact.

Apply Now

Lead SRE

Job Details