Senior SRE Engineer

Senior SRE Engineer | Azure, Observability & Reliability Engineering | Platform Transformation in Financial Services

  • Location: London (Hybrid, typically 3 days onsite)
  • Permanent, Full-time
  • Salary: £80k–£90k + bonus + benefits
  • Visa sponsorship: Not available

The Role

You’ll join as the first dedicated SRE hire, with responsibility for establishing SRE practices across a live Azure-based platform and a new strategic platform being brought into service.

The role is focused on reliability, observability, incident management, resilience, and automation. You’ll help define how services are measured and operated, introducing practical improvements around SLIs, SLOs, error budgets, monitoring, and service ownership.

This is a hands-on role for someone who has done this before and can bring structure, prioritise well, and build an SRE capability in a pragmatic way.

Non-Negotiables

  • Site Reliability Engineering in production environments
  • Azure cloud environments in enterprise-scale businesses
  • SLO / SLI / error budget design and implementation
  • Observability tooling (Prometheus, Grafana, OpenTelemetry or similar)
  • Incident leadership across Sev1 / Sev2 environments
  • Disaster recovery, resilience testing, RTO / RPO
  • Terraform infrastructure as code
  • CI/CD pipelines and engineering enablement
  • Strong scripting with PowerShell, Bash or Python
  • Experience improving reliability in hybrid estates (cloud + IaaS)
  • Ability to introduce new ways of working and build an SRE practice from scratch

They are looking for someone with a strong Azure background, but the priority is proven SRE capability and the ability to apply it effectively.

What You’ll Work With

  • Azure platform engineering
  • Azure Container Apps / cloud-native services
  • Terraform infrastructure as code
  • Prometheus monitoring
  • Grafana dashboards
  • OpenTelemetry tracing
  • Azure DevOps pipelines
  • GitHub Actions CI/CD
  • Windows Server and Linux estates
  • Service Bus, Event Hubs and Kafka
  • Incident management, runbooks, failover and resilience testing

Nice to Haves

  • Financial services or regulated environment experience
  • FCA / PRA operational resilience exposure
  • Payments or FX platform experience
  • Chaos engineering
  • FinOps or cloud cost awareness
  • Kubernetes exposure

Kubernetes knowledge is useful, but not essential.

Why Join / Projects

  • Establish the SRE capability from the ground up
  • Define and implement SLIs, SLOs and error budgets
  • Improve observability across platforms and services
  • Lead incident response and post-incident improvements
  • Drive resilience, failover and automation initiatives
  • Support the move toward a modern, reliability-first platform

You’ll play a key role in shaping how reliability is engineered across both the current platform and a new strategic platform being brought into production.

Employee Benefits

  • Pension
  • Private healthcare
  • Training and certification support

Senior SRE Engineer | Azure, Observability & Reliability Engineering | Platform Transformation in Financial Services

Job Details

Company
Prism Digital
Location
City of London, London, United Kingdom
Hybrid / Remote Options
Posted