SRE Lead (Banking/Financial)

Job Description:

Our client is transforming their production support function into a full Site Reliability Engineering (SRE) model, and we’re looking for a hands-on SRE Lead to help establish and lead the SRE capability. We are looking for a hands-on SRE Lead to establish and lead the SRE function, ensuring operational excellence across production systems.

Key Responsibilities:

Lead the SRE function across the engineering organisation and drive operational excellence across production systems.
Define and implement the observability and monitoring strategy, including dashboards, alerting, SLOs, SLAs, and error budgets.
Establish comprehensive monitoring coverage to ensure visibility into system health, infrastructure, and business-critical workflows.
Drive adoption of AI-driven tools and automation for proactive system troubleshooting, incident triage, and root cause analysis.
Lead and mentor a team of SRE Engineers embedded within engineering teams.
Manage incident response processes, including on-call management and post-incident reviews.
Collaborate with product and engineering teams to build reliability and observability into new systems.
Monitor UI behaviour and end-to-end system performance, not just infrastructure metrics.

Essential Skills & Experience:

Proven experience as an SRE Lead or Senior SRE in large-scale, high-availability production environments.
Strong experience with observability and monitoring tools such as Datadog, Grafana, Prometheus, PagerDuty, or similar.
Experience managing incident response, on-call processes, and post-incident reviews.
Strong understanding of operational tooling for data ingestion and calculation pipelines, with the ability to detect anomalies in system behaviour.
Ability to provide technical leadership and influence engineering stakeholders.

Nice to Have:

Experience within financial data pipelines, index calculation, or capital markets systems.
Exposure to AI/ML-based tools for anomaly detection and automated troubleshooting.
Experience monitoring application-layer and UI behaviour, beyond infrastructure metrics.
Experience building SRE practices in a greenfield or transformation environment.

Apply Now

Job Details