Principal SRE Engineer
A leading global financial institution is seeking a Principal Site Reliability Engineer to provide essential support for their Foreign Exchange (FX) desk, focusing on trading and risk applications, including an advanced algorithmic ultra-low latency stack. This is a unique opportunity to play a pivotal role in ensuring the reliability, performance, and scalability of a real-time trading environment by applying best-in-class SRE principles.
PRINCIPAL SITE RELIABILITY ENGINEER
Salary: £110,000 - £125,000Location: London
A leading global financial institution is seeking a Principal Site Reliability Engineer to provide essential support for their Foreign Exchange (FX) desk, focusing on trading and risk applications, including an advanced algorithmic ultra-low latency stack. This is a unique opportunity to play a pivotal role in ensuring the reliability, performance, and scalability of a real-time trading environment by applying best-in-class SRE principles. You will work directly with senior traders and developers on the trading floor, optimising workflows, troubleshooting complex issues, and driving ongoing improvements across both processes and technology.
What you'll do:
As a Principal Site Reliability Engineer, you will immerse yourself in the fast-paced world of FX trading technology. Your day-to-day responsibilities will centre around maintaining the stability of mission-critical trading applications while proactively identifying areas for improvement. You will be expected to respond swiftly to incidents, leveraging your analytical skills to resolve issues efficiently while minimising disruption. By enhancing monitoring capabilities and automating routine tasks, you will help build a resilient infrastructure that supports seamless trading operations. Working closely with both technical teams and business stakeholders on the trading floor, you will translate business needs into reliable technical solutions. Your mentorship will foster growth among junior engineers as you promote best practices in site reliability engineering. Through your commitment to governance standards and process optimisation, you will contribute significantly to the overall success of the FX desk's technology platform.
* Respond rapidly to production incidents using data-driven decision making to minimise downtime and financial impact while leading root cause analysis and conducting blameless post-mortems.* Enhance application health monitoring by implementing robust observability solutions and automating manual processes to improve system resilience.* Drive cost optimisation initiatives and manage capacity resources to ensure efficient and scalable operations across all FX trading platforms.* Collaborate with development teams to design and deploy fault-tolerant, scalable solutions that align with evolving business goals.* Enforce adherence to change management, incident management, problem management policies as well as specific non-financial risk frameworks required by the organisation.* Mentor junior team members by sharing knowledge and promoting a culture of engineering excellence and continuous improvement throughout the team.* Engage directly with key stakeholders including senior traders and lead developers to optimise trading workflows and troubleshoot complex technical issues.* Champion automation efforts by building tools that reduce manual effort and operational risk within the production environment.* Support governance activities by ensuring compliance with regulatory requirements relevant to trading systems operations.* Contribute actively to process reviews aimed at identifying opportunities for further efficiency gains or risk reduction.
What you bring:
* Proven experience in a production support, site reliability engineering or DevOps role within a trading or financial services environment is essential for success in this position.* Deep technical expertise in Linux/Unix systems administration combined with strong SQL skills and proficiency in scripting languages such as Python or Java.* Demonstrated experience with monitoring and observability tools including Prometheus, Grafana, Splunk, Geneos, OpenTelemetry or Corvil is highly desirable.* Familiarity with cloud platforms as well as containerisation technologies like Kubernetes or Docker alongside CI/CD pipeline management is important for this role.* Comprehensive understanding of trade lifecycle processes along with fundamental knowledge of trading systems; prior exposure to FX products or algorithmic trading is advantageous but not mandatory.* Excellent communication skills enabling you to explain complex technical concepts clearly to both technical colleagues and non-technical stakeholders alike.* A passion for automation demonstrated through hands-on experience building tools that reduce manual intervention while mitigating operational risks within large-scale environments.* Experience enforcing governance frameworks related to change management, incident response or problem resolution within regulated industries adds significant value.* A collaborative approach that fosters teamwork across diverse groups including developers, traders and other business partners is vital for thriving in this environment.
Robert Walters Operations Limited is an employment business and employment agency and welcomes applications from all candidates
- Company
- Robert Walters
- Location
- London, South East, England, United Kingdom
- Employment Type
- Full-Time
- Salary
- £110,000 - £125,000 per annum
- Posted
- Company
- Robert Walters
- Location
- London, South East, England, United Kingdom
- Employment Type
- Full-Time
- Salary
- £110,000 - £125,000 per annum
- Posted