Senior SRE Engineer
Senior SRE Engineer | Azure, Observability & Reliability Engineering | Platform Transformation in Financial Services
- Location: London (Hybrid, typically 3 days onsite)
- Permanent, Full-time
- Salary: £80k–£90k + bonus + benefits
- Visa sponsorship: Not available
The Role
You’ll join as the first dedicated SRE hire , with responsibility for establishing SRE practices across a live Azure-based platform and a new strategic platform being brought into service.
The role is focused on reliability, observability, incident management, resilience, and automation . You’ll help define how services are measured and operated, introducing practical improvements around SLIs, SLOs, error budgets, monitoring, and service ownership.
This is a hands-on role for someone who has done this before and can bring structure, prioritise well, and build an SRE capability in a pragmatic way.
Non-Negotiables
- Site Reliability Engineering in production environments
- Azure cloud environments in enterprise-scale businesses
- SLO / SLI / error budget design and implementation
- Observability tooling (Prometheus, Grafana, OpenTelemetry or similar)
- Incident leadership across Sev1 / Sev2 environments
- Disaster recovery, resilience testing, RTO / RPO
- Terraform infrastructure as code
- CI/CD pipelines and engineering enablement
- Strong scripting with PowerShell, Bash or Python
- Experience improving reliability in hybrid estates (cloud + IaaS)
- Ability to introduce new ways of working and build an SRE practice from scratch
They are looking for someone with a strong Azure background, but the priority is proven SRE capability and the ability to apply it effectively.
What You’ll Work With
- Azure platform engineering
- Azure Container Apps / cloud-native services
- Terraform infrastructure as code
- Prometheus monitoring
- Grafana dashboards
- OpenTelemetry tracing
- Azure DevOps pipelines
- GitHub Actions CI/CD
- Windows Server and Linux estates
- Service Bus, Event Hubs and Kafka
- Incident management, runbooks, failover and resilience testing
Nice to Haves
- Financial services or regulated environment experience
- FCA / PRA operational resilience exposure
- Payments or FX platform experience
- Chaos engineering
- FinOps or cloud cost awareness
- Kubernetes exposure
Kubernetes knowledge is useful, but not essential.
Why Join / Projects
- Establish the SRE capability from the ground up
- Define and implement SLIs, SLOs and error budgets
- Improve observability across platforms and services
- Lead incident response and post-incident improvements
- Drive resilience, failover and automation initiatives
- Support the move toward a modern, reliability-first platform
You’ll play a key role in shaping how reliability is engineered across both the current platform and a new strategic platform being brought into production.
Employee Benefits
- Pension
- Private healthcare
- Training and certification support
Senior SRE Engineer | Azure, Observability & Reliability Engineering | Platform Transformation in Financial Services