Site Reliability Engineering Manager

Site Reliability Engineering Manager | London (2 days Hybrid)

We're partnering with one of the UK's most recognised and high-traffic consumer tech platforms to find an Engineering Manager to lead their Site Reliability function.

The Role

This is a blended people leadership and technical role, responsible for operational excellence, observability, and reliability at scale across a platform that serves millions of users. You'll own incident management processes, drive reliability engineering standards, and ensure the business maintains its exceptionally high availability targets.

Key Responsibilities

  • Own monitoring, alerting and observability strategy, ensuring product teams have high reliability confidence and fast incident detection and resolution
  • Lead and standardise incident management processes, maintaining a culture of accountability, transparency and continuous learning
  • Define reliability patterns and standards to reduce cascading failures across distributed systems
  • Own and manage the reliability roadmap, OKR delivery and alignment with wider business goals
  • Lead, develop and grow a team of engineers — setting objectives, growth plans and fostering a psychologically safe, inclusive environment.

What You'll Need

  • Proven experience in SRE management across production environments — observability, monitoring and service delivery
  • Strong understanding of reliability in distributed microservices and cloud-based architectures
  • Experience with modern SRE tooling, incident management workflows and SLO/SLI frameworks
  • Familiarity with platform engineering concepts and reducing friction for product teams
  • Strong leadership, communication and stakeholder management skills

Job Details

Company
Gravitas Recruitment Group (Global) Ltd
Location
City of London, London, United Kingdom
Posted