Senior SRE Lead
Company: Albany Beck
Location: London (Hybrid)
About Albany Beck
Albany Beck is a Management Consultancy focused on providing specialist talent and transformative solutions to Financial Services clients. We combine subject matter expertise with innovative delivery models that help clients scale efficiently, while offering meaningful, long-term career opportunities to our people. At Albany Beck, you’ll be joining an organisation that is passionate about capability build, technical excellence, and delivering meaningful change within complex enterprise environments.
Role Overview
Albany Beck is seeking a Senior SRE Lead / Observability SME to lead the establishment of a new enterprise Site Reliability Engineering (SRE) capability, with a primary focus on designing and implementing a modern observability suite and operational resilience framework.
This is a foundational build role, responsible for defining how reliability engineering and observability are structured, measured, and embedded across a complex global technology estate. The successful candidate will play a key role in shifting the organisation from reactive operational support to a metrics-driven, engineering-led reliability model.
You will work across infrastructure, platform, and application teams to define standards, implement tooling, and establish operational practices that improve service stability, incident response maturity, and end-to-end visibility across systems.
This role is best suited to someone who has helped design or scale SRE and observability capabilities in large, distributed, and regulated environments.
Key Responsibilities
- Lead the design, build, and rollout of an enterprise-wide observability capability
- Define observability standards, including metrics, logging, tracing, and alerting frameworks
- Establish Site Reliability Engineering (SRE) operating model and engineering practices
- Develop and embed operational resilience and service reliability measurement frameworks
- Design requirements-based architecture for observability and reliability tooling
- Improve incident, problem, and outage management maturity across technology teams
- Partner with infrastructure, platform, and application support teams to embed SRE principles
- Drive transition from reactive operational support to proactive, metrics-driven engineering
- Define and implement service level indicators (SLIs) and service level objectives (SLOs)
- Support tooling selection, integration, and optimisation across observability platforms
- Contribute to improving overall operational resilience within a global distributed environment
Key Skills & Experience
- Proven experience as a Senior SRE Lead, Principal Engineer, or Observability SME in enterprise environments
- Strong background in designing and implementing observability platforms (metrics, logs, tracing, monitoring)
- Experience building or scaling SRE capabilities within large, complex organisations
- Strong understanding of operational resilience frameworks and reliability engineering principles
- Experience working in private cloud or hybrid enterprise infrastructure environments
- Strong knowledge of incident management, problem management, and operational maturity models
- Ability to define and implement SLIs, SLOs, and error budgets
- Experience working across distributed global technology estates
- Strong stakeholder management skills with the ability to influence engineering and infrastructure teams
- Experience in transitioning organisations from reactive support models to proactive engineering-led operations
- Strong architectural mindset with experience in requirements-based design for observability solutions
Environment
- Enterprise SRE capability buildout (greenfield / early maturity stage)
- Observability suite implementation across multiple platforms and teams
- Private cloud environment with global distributed infrastructure footprint
- High complexity, multi-team engineering landscape
- Focus on operational resilience and service reliability uplift