Lead SRE

Role: Lead Site Reliability Engineer (Technical Lead)

Salary: £80,000 – £100,000

Location: London (Hybrid – 2 days per month)

We are working with a mission-led technology organisation that is continuing to scale a fully cloud-native platform as part of a major initiative. As they move away from traditional data centres, they are investing heavily in building a highly reliable, scalable and observable cloud platform.

As a Lead SRE, you will act as a technical leader within the reliability function, setting direction and driving best practices across engineering teams. This is still a hands-on role, but with added ownership around shaping strategy, influencing architecture and mentoring engineers.

You will be solving complex engineering problems across distributed systems, while helping define how reliability is embedded across the wider platform as it continues to scale.

Key Responsibilities

  • Leading the design and evolution of monitoring and observability systems
  • Defining and driving SLOs, SLIs and error budgets across teams
  • Owning incident management processes, post-mortems and continuous improvement
  • Partnering with engineering teams to design resilient, fault-tolerant systems
  • Driving automation across infrastructure, deployments and operational workflows
  • Contributing to capacity planning, performance optimisation and cost efficiency
  • Providing technical leadership through design reviews and architectural decisions
  • Mentoring engineers and influencing reliability best practices across the organisation

Tech Environment

  • GCP and AWS
  • Kubernetes and containerised workloads
  • Terraform and Infrastructure as Code
  • Prometheus, Grafana, Datadog and modern observability tooling
  • CI/CD pipelines and automation tooling
  • Python, Go or similar scripting languages
  • Distributed systems at scale

About You

  • Strong background in SRE, DevOps or Platform Engineering at a senior or lead level
  • Proven experience leading technical direction or mentoring engineers
  • Experience running and supporting production systems at scale
  • Strong understanding of observability, monitoring and reliability principles
  • Hands-on experience with cloud infrastructure and Kubernetes
  • Experience with Infrastructure as Code (Terraform or similar)
  • Comfortable debugging complex systems across infrastructure and application layers
  • Passionate about automation, reliability and improving engineering standards

This is a great opportunity to step into a technical leadership role, combining hands-on engineering with the chance to shape how reliability is delivered across a modern, cloud-native platform with real-world impact.

Job Details

Company
Pulse Recruit
Location
London Area, United Kingdom
Hybrid / Remote Options
Posted