SRE Lead (Banking/Financial)

Job Description:

  • Our client is transforming their production support function into a full Site Reliability Engineering (SRE) model, and we’re looking for a hands-on SRE Lead to help establish and lead the SRE capability. We are looking for a hands-on SRE Lead to establish and lead the SRE function, ensuring operational excellence across production systems.

Key Responsibilities:

  • Lead the SRE function across the engineering organisation and drive operational excellence across production systems.
  • Define and implement the observability and monitoring strategy, including dashboards, alerting, SLOs, SLAs, and error budgets.
  • Establish comprehensive monitoring coverage to ensure visibility into system health, infrastructure, and business-critical workflows.
  • Drive adoption of AI-driven tools and automation for proactive system troubleshooting, incident triage, and root cause analysis.
  • Lead and mentor a team of SRE Engineers embedded within engineering teams.
  • Manage incident response processes, including on-call management and post-incident reviews.
  • Collaborate with product and engineering teams to build reliability and observability into new systems.
  • Monitor UI behaviour and end-to-end system performance, not just infrastructure metrics.

Essential Skills & Experience:

  • Proven experience as an SRE Lead or Senior SRE in large-scale, high-availability production environments.
  • Strong experience with observability and monitoring tools such as Datadog, Grafana, Prometheus, PagerDuty, or similar.
  • Experience managing incident response, on-call processes, and post-incident reviews.
  • Strong understanding of operational tooling for data ingestion and calculation pipelines, with the ability to detect anomalies in system behaviour.
  • Ability to provide technical leadership and influence engineering stakeholders.

Nice to Have:

  • Experience within financial data pipelines, index calculation, or capital markets systems.
  • Exposure to AI/ML-based tools for anomaly detection and automated troubleshooting.
  • Experience monitoring application-layer and UI behaviour, beyond infrastructure metrics.
  • Experience building SRE practices in a greenfield or transformation environment.

Job Details

Company
Ascendion
Location
London Area, United Kingdom
Posted