City of London, London, United Kingdom Hybrid / WFH Options
Explore Group
SiteReliability Engineer (Hybrid – London) | RegTech Innovator | AWS, Terraform, Kubernetes Location: London (Hybrid – 2-3 days in office) Are you passionate about scalable infrastructure and modern DevOps practices … Want to make a tangible impact in a fast-growing RegTech company that’s transforming how businesses navigate regulatory compliance? Join us as a SiteReliability Engineer (SRE) and help build and operate the infrastructure that powers cutting-edge compliance solutions used by global financial institutions. What You'll Do Maintain and improve our AWS-based infrastructure using … Docker, Kubernetes (EKS) CI/CD: GitHub Actions, Argo CD, Helm Monitoring: Prometheus, Grafana, CloudWatch, OpenTelemetry Languages: Python, Bash, Go (bonus) What We're Looking For Strong experience in SRE, DevOps, or Production Engineering roles Proven hands-on skills with AWS , Terraform , and Kubernetes Experience with production support, incident management, and RCA practices Comfortable working in a fast-paced More ❯
live and transferrable DV Clearance Are you passionate about reliability, automation, and supporting mission-critical systems? Join this global defence organisation as a SiteReliability Engineer (SRE) and help shape the future of one of the UK's most vital national security platforms. You'll be joining a growing SRE team at the heart of the customer … s mission, focused on ensuring performance, availability, and scalability-while driving continuous improvement and innovation. About the Role As an SRE, you'll combine your operational expertise with software engineering skills to minimise manual effort and drive automation across complex systems. This role is perfect for someone who thrives on solving hard problems, automating the mundane, and building intelligent … overtime. Proactively enhance system availability, performance, and resilience. Develop tools and solutions to automate repetitive tasks and reduce operational toil. Collaborate with development teams to embed best practices and SRE principles. Deploy and manage monitoring systems to provide intelligent observability. Engage with the wider DevOps/SRE community within the organisation. Ideal Skills & Experience We're more interested in your More ❯
maintain CI/CD Pipelines: Jenkins, GitHub Containers & Orchestration: Docker, Kubernetes Messaging & APIs: REST Databases: PostgreSQL, MongoDB Languages: Python or Bash Your Experience 5+ years in a DevOps/SiteReliabilityEngineering role supporting production environments. Strong understanding of network architecture , security best practices, and service segregation. Skilled in containerisation, cloud platforms , and modern CI/CD … pipelines. Comfortable with on-call responsibilities and incident response. Collaborative and proactive in a fast-moving engineering environment. If you're passionate about modern infrastructure, security, and contributing to the future of digital finance, this is an ideal opportunity to make a lasting impact in a high-growth environment. Senior DevOps Engineer – FinTech/Blockchain, Hybrid London, up to More ❯
City of London, London, United Kingdom Hybrid / WFH Options
Vertus Partners
trading and risk systems – including a highly complex, ultra-low latency algorithmic platform. This newly created role will focus on driving automation, reducing manual operational overhead, and maturing the SRE culture across the FX technology estate. You'll play a key part in ensuring the reliability, performance, and scalability of a real-time trading environment used by both internal … fixes to reduce future risk and downtime Design and implement scalable infrastructure solutions in alignment with business objectives and regulatory requirements Guide junior team members and help shape the SRE strategy within FX globally Required Skills: Hands-on SRE/DevOps experience in a trading or financial services environment Strong Linux/Unix administration skills with solid scripting experience (Python … environments is highly desirable Experience working with databases (Oracle, SQL) is a plus This is a unique opportunity to join an innovative organisation to define and implement your own SRE ideas across a high-impact area of the bank. If the opportunity to build a more resilient and automated FX production estate with support from leadership interests you, this could More ❯
City of London, London, United Kingdom Hybrid / WFH Options
Annapurna
SiteReliability Engineer Location … London Hybrid (3 days WFH) Salary Range: Up to £140,000 Annapurna is working on behalf of a pioneering technology company to recruit a SiteReliability Engineer (SRE) . This is a unique opportunity to play a vital role in developing cutting-edge AI systems that power autonomous vehicle technology. What to Expect: The SRE will be instrumental … in ensuring the stability, resilience, and efficiency of complex autonomous systems. This is a role for someone who thrives on innovation, loves solving infrastructure and reliability challenges, and wants to play a significant role in shaping the future of AI-driven mobility. Key responsibilities include: Ensuring smooth and continuous operation of autonomous vehicle systems in real-world environments. Developing More ❯
Social network you want to login/join with: SiteReliability Engineer, City of London col-narrow-left Location: City of London, United Kingdom Job Category: Information Technology EU work permit required: Yes col-narrow-right Job Reference: BBBH64028_1750084692 Job Views: 6 Posted: 16.06.2025 Expiry Date: 31.07.2025 col-wide Job Description: SiteReliability Engineer Whitehall … Resources require a SiteReliability Engineer to work with a key client on a 6 month initial contract. *This role will involve on site work in London 3 days per week. *Inside IR35. *This role will require some on-call work. SiteReliability Engineer The Role As a SiteReliability/DevOps Engineer … you will play a critical role in managing cloud infrastructure, ensuring the reliability of production systems, and improving end-to-end deployment pipelines. This role combines deep operational responsibilities with a strong focus on automation, observability, and continuous improvement. You will be responsible for maintaining high system availability, enabling rapid delivery through CI/CD, and supporting development teams More ❯
solutions, enabling our team to innovate at an unprecedented pace. 🛠️ Skills and Experience 5+ years of experience in a Cloud Infrastructure, DevOps, or SiteReliabilityEngineering (SRE) role. Strong proficiency with Python for scripting and automation. Extensive hands-on experience with Google Cloud Platform (GCP) and its core services (Cloud Run, GKE, IAM, Cloud Storage). Expertise More ❯
risk frameworks Mentorship: Support the growth of junior team members and promote a culture of engineering excellence and continuous improvement Key Requirements Proven experience in a production support, SRE, or DevOps role within a trading or financial services environment Strong technical skills in Linux/Unix systems, SQL, and scripting Strong experience with a programming language such as Python More ❯
generating results that allow our clients to thrive. What You'll Do The Senior Director – Operations and ReliabilityEngineering is responsible for blendingSite ReliabilityEngineering (SRE), DevOps, and traditional operations modelsto build a next-generationReliability Engineering function. This role ensuresend-to-end automation at scale, 24x7 operational excellence, and high availabilityacrossall of BCG, includingBCG Core … agility and operational resilience. Establish workforce development programs forAI-driven operations, automation, and modern reliability practices. What You'll Bring Required Qualifications: 15+ years of experiencein IT operations, SRE, DevOps, or platform engineering. 5+ years in a senior leadership role, managinglarge-scale IT environments. Deep technical expertise incloud computing (AWS, Azure, GCP), on-prem infrastructure, and hybrid environments. Proven … remediation. Strong understanding ofzero-trust security, regulatory compliance, and risk management. Excellent leadership, communication, and stakeholder management skills. Preferred Qualifications: Certifications:ITIL, AWS/Azure/GCP Solutions Architect, SRE Foundation, CISSP, or equivalent. Experience withKubernetes, Terraform, Ansible, and AI-powered operations tools. Strong problem-solving abilities, with a data-driven approach to operational excellence. TheSenior Director – Operations Platform Leadis More ❯
City of London, London, United Kingdom Hybrid / WFH Options
Stealth AI Startup
and audit trails ready for GDPR and SOC 2. Partner with AI and product engineers to shape runtime environments and data pipelines for large-scale model serving. Continuously tune reliability and cost through chaos testing, capacity planning and proactive incident response. What we think “great” looks like: Deep cloud mastery with AWS, including networking, IAM, storage and compute. Strong … policy as code and reusable patterns. Advocacy for CI/CD, building pipelines in GitHub Actions, GitLab CI or CircleCI with automated tests and security gates. An observability and SRE mindset, using tools such as Prometheus, Grafana, Loki or ELK and OpenTelemetry. A security-first but pragmatic approach, covering secrets management, image provenance and zero-trust networking. Proficiency in at … pre Series A. Hybrid flexibility, with collaboration in person for the important work and remote working the rest of the time. Private healthcare, pension scheme and an annual off-site to disconnect and recharge. Visa sponsorship available for exceptional talent already eligible to work in the UK. An opportunity to define the engineering culture and leave your mark More ❯
to ensuring that our state-of-the-art models are accessible, reliable, and performant for our global user base. 🛠️ Skills and Experience 5+ years of experience in a DevOps, SRE, or MLOps role. Strong proficiency in Python and extensive experience with ML frameworks like PyTorch. Proven experience building and managing CI/CD pipelines for machine learning systems. Deep expertise More ❯
regulated solutions to institutional finance, this firm is redefining how digital assets are secured and managed. As part of their expansion, they’re looking to find a hands-on SiteReliability Engineer to join their mission-critical engineering team. The Role: Our client is building a next-generation digital assets platform using Java Spring microservices on Azure … and annual discretionary bonus. Pension contributions, in addition to Health Insurance, Life Assurance. 25 Annual Leave. What You’ll Be Doing It's not just DevOps — it's true SRE: SLAs, SLOs, SLIs, error budgets, and incident tooling are at the centre of the role. Own and evolve observability frameworks, define resilience strategies, and contribute directly to the Java backend … Insights, Terraform). You’ll be the bridge between software engineering and operations, directly influencing architectural decisions. What You’ll Bring 8+ years in production engineering or SRE roles. Deep Java/Spring experience. Expertise in monitoring, alerting, and incident tooling (Prometheus, Grafana, OpenTelemetry, ELK, etc.). Experience with Azure, Kubernetes, and scalable systems in high-uptime environments More ❯
looking for a SiteReliability Engineer to join their highly skilled, innovative team. Essential skills: Strong proficiency in Python for infrastructure and automation Hands-on experience in SRE, DevOps or production engineering roles Deep understanding of monitoring, incident response workflows, and system architecture Productive approach to improving systems and reducing technical debt Strong collaboration and communication skills … working closely with developers, quants, and platform engineers Experience designing and delivering scalable, reliable production systems Proficiency with Linux/Unix systems Bachelor’s degree in CS, Engineering or a related field Familiarity with Kubernetes, Docker, or container orchestration technologies Experience with automation tools such as Terraform or Ansible Background in Go, Bash or other system-level languages Exposure … design and implement automation for operations, deployments, monitoring and incident management, as well as owning the observability stack (metrics, logs, traces and alerting). You will also: apply core SRE principles (SLIs, SLOs, error budgets) to enhance system reliability; build, document, and improve high-performance system designs; lead incident response and implement improvements; collaborate closely with quant developers/ More ❯
Essential: Has been involved in cloud initiatives, contributed to SRE or Platform Engineering groups and helped deliver key infrastructure for core initiatives. Expert working knowledge and understanding of on-prem, cloud platforms, particularly AWS , and the ability to implement, and optimize cloud architectures. Familiarity with container orchestration platforms, specifically k8s and experience running large scale clusters at enterprise level. … best practices, including the well architected framework. Experience with GitOps and using either ArgoCD or Flux to improve developer experience. Creation of Terraform/OpenTofu modules and enabling product engineering teams to autonomously deploy applications whilst maintaining high standards. Ability to automate provisioning, scaling, and maintenance of on-prem resources using tools like Ansible, TF, packer, etc Experience with … enhance developer experience, alongside developing secure and cost-effective CI/CD pipelines. Good experience with monitoring tools and providing the right level of observability and monitoring for product engineering teams. Demonstrate ability to be cost aware and experience on how to optimize cloud costs. Ability to collaborate and work effectively as a team, providing mentoring to junior members More ❯
Kong Mesh (based on Kuma) for managing microservices communication, security, and observability at scale. You’ll play a crucial role in defining service-to-service architecture and ensuring platform reliability, scalability, and security. Key Responsibilities: • Lead the design and deployment of Kong Mesh across our environments (on-prem and cloud). • Define and enforce best practices for service mesh … evolving business and technical needs. Requirements: Must-Have: • Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field. • 6+ years of experience in DevOps, SRE, or Platform Engineering roles. • 4+ years of hands-on experience implementing Kong Mesh/Kuma, or similar service mesh solutions (Istio, Linkerd). • Strong knowledge of Kubernetes, Envoy proxy … policies. • Experience with CI/CD tools, logging (e.g., Fluentd, Loki), and monitoring (e.g., Prometheus, Grafana). About us Ascendion is a Global, leading provider of AI-first software engineering services, delivering transformative solutions across North America, APAC, and Europe. We are headquartered in New Jersey. We combine technology and talent to deliver tech debt relief, improve engineeringMore ❯
City of London, London, United Kingdom Hybrid / WFH Options
Enigma
APIs that enable internal and external users to access data and model outputs. Implement secure authentication and authorization systems for platform users. Maintain and improve our cloud platform’s reliability, security, and compliance (e.g., GDPR, HIPAA readiness). Automate testing, training, and deployment of models through … robust CI/CD pipelines. Monitor and troubleshoot performance issues across data and inference workflows in production. What We’re Looking For 5+ years of experience in DevOps, MLOps, SRE, or Data Engineering roles. Strong proficiency with public cloud platforms (e.g., GCP, AWS, or Azure), with preference for GCP. Expertise in Terraform and infrastructure-as-code practices. Solid experience More ❯
across Landing Zones, Cloud Management, Automation Platforms, Kubernetes, and a dedicated 'Migration Factory,' all of which collectively offer ongoing operational support to internal customers, notably the product and product engineering teams at client. Your role will enable you to work closely with the product and product engineering teams to enable deliver experience improvements and drive autonomous delivery, focusing … of our customers, all while adhering to the requisite security and compliance standards. Collaboration with your peers is paramount in this position. Working closely with your team lead and engineering manager, you will ensure a continuous cycle of improvement and evolution, ensuring that our services consistently deliver expected outcomes and provide maximum value to the … organization. Your guidance and hands-on approach will be instrumental in maintaining a cutting-edge and efficient cloud platform operation. Essential: Has been involved in cloud initiatives, contributed to SRE or Platform Engineering groups and helped deliver key infrastructure for core initiatives. Expert working knowledge and understanding of on-prem, cloud platforms, particularly AWS, and the ability to implement More ❯
City of London, London, United Kingdom Hybrid / WFH Options
Hunter Bond
are keen to find an Engineer with strong experience around Platform automation, particularly around Terraform/Ansible. The role sits between Platform Engineering, DevOps, Linux Systems Administration and SRE, and incorporates elements of all of these positions. You will build, design and architect automated solutions for scalable deployment, in private and public cloud infrastructure. You will work in both More ❯
What You ll Need: Strong coding skills in Go, Python or Java. Deep experience with GitHub Enterprise, GitHub APIs (REST & GraphQL), Actions, and Apps. Background in Ops, Infrastructure or SRE roles with a focus on automation. Familiarity with modern SDLC practices, security tooling, version control and testing frameworks. Experience using Infrastructure as Code tools (e.g. Terraform). Strong problem-solving More ❯
Working for an industry leading, high-growth SaaS business with some of the biggest brand names in the world as clients, the Senior SiteReliability Engineer (SRE) will join the global SRE team, working closely with software engineers to build, maintain, and scale resilient systems and provide first line operational support. You’ll be part of a hybrid … in close collaboration with DevOps team Maintaining and enhancing Engineering Operational Documentation for supported products Providing expertise to build and maintain products operational documentation and setting up product SRE practices Working in close collaboration with SRE team members and Engineering teams based in around the world Helping build a strong culture of reliability and performance in their … services. The Senior SiteReliability Engineer will have: Strong experience in SRE, DevOps Engineer or production engineer Experience in Infrastructure as code (IaC) using Terraform Experience in building continuous integration declarative pipelines in Jenkins or CircleCI Experience with platforms like Kubernetes, Containers and public clouds (GCP or AWS) Experience with deployment and monitoring of highly scalable products This More ❯
Lead SiteReliability Engineer Central London (Hybrid) Up to £95k + Car Allowance & Bonus TRIA are working with a leading hospitality client for a Lead SRE, where they are investing heavily in the performance, stability, and reliability of its digital platforms. This is a hands-on leadership role - you won’t just guide others, you’ll be … uptime The stack includes Kubernetes , Terraform , AWS , Python , and modern CI/CD tools, and it's evolving. If you're confident in a crisis, understand what a good SRE practice looks like, and want to leave systems in a better place than you found them, please apply to be considered and learn more! What you’ll bring : Experience in … high-traffic digital or eCommerce platforms 5+ years in SRE/DevOps roles; strong background in incident response Observability, automation, and infrastructure as code expertise Leadership skills - mentoring others or leading from the front More ❯
FX SiteReliability Engineer – Front Office, Java, Python, Monitoring, Trade Lifecycle, Docker, Kubernetes, Docker, Kubernetes, Linux, Unix – London – Paying up to £125,000 A Production Engineer is currently being sought by a leading investment bank to join their Foreign Exchange team on a permanent basis in London. You will be responsible for providing end to end support across … technical abilities to ensure pipelines are built and code is production ready. In addition, you will offer coaching and mentoring to support the growth of junior team members promoting engineering best practice. To be successful in this role, you will need the following: Hands-on experience using programming languages. Python or Java is preferred. Full understanding of the end More ❯
analysts, and support staff. Overview: We are looking for a highly skilled and visionary leader to join our team as the Head of SiteReliabilityEngineering (SRE) with a strong focus on AWS cloud infrastructure. The ideal candidate will have a deep understanding of cloud architectures, extensive experience in SRE practices, and the ability to lead and … scale SRE teams to ensure the availability, performance, and security of our systems. Key Responsibilities: Leadership and Team Management: Lead and manage the SRE team to ensure high availability, scalability, and performance of our AWS-based infrastructure. Provide mentorship and guidance to junior and senior engineers, fostering a culture of operational excellence and continuous improvement. Cloud Infrastructure Management: Oversee the … design, implementation, and maintenance of cloud infrastructure in AWS, ensuring the systems are secure, reliable, and highly available. Use best practices for AWS services, automation, and monitoring. SRE Practices Implementation: Establish and lead the implementation of SRE principles, such as Service Level Objectives (SLOs), Service Level Indicators (SLIs), and Error Budgets, to drive the team's focus on reliability. Incident More ❯
is redefining how digital assets are secured and managed. As part of their expansion, they’re looking to bring on a Senior Production Engineer to lead the charge on reliability, resilience, and operational excellence within a complex, high-uptime platform environment. What You’ll Get A superb opportunity to join an institutionally backed, cutting edge Crypto Fintech at the … salary and annual discretionary bonus. Pension contributions, in addition to Health Insurance, Life Assurance. 25 Annual Leave. What You’ll Be Doing This is a hands-on and strategic engineering role where you’ll be responsible for ensuring production stability across a highly dynamic microservices architecture hosted in Azure . You’ll have end-to-end ownership over reliability … and monitoring across distributed systems. Collaborating with cross-functional teams to align platform strategy and reliability goals. What You’ll Bring: 5+ years in software engineering or SRE/production infrastructure roles. Strong experience with Java (Spring) and cloud platforms (ideally Azure ). Proven track record in building and maintaining mission-critical systems. Deep understanding of Kubernetes, observability More ❯
Job Title: SiteReliability Engineer (SRE) – High-Frequency Trading Infrastructure Location: Onsite – New York City, London, or Singapore £140,000 - £240,000 (Depending on Level of experience and interview performance) Our Client, a leading high-frequency trading firm, is seeking a SiteReliability Engineer (SRE) to architect and build next-generation production tools and infrastructure for … critical role focused on reliability, scalability, and performance in one of the most competitive and technologically advanced industries. About the Role This opportunity is ideal for an experienced SRE who thrives in production-critical environments. The successful candidate will join a high-caliber team of engineers and work on automating, scaling, and securing systems that drive global trading operations. … Key Responsibilities Design and develop scalable production tools for deployment, monitoring, and infrastructure automation. Ensure the reliability and efficiency of trading systems through proactive automation and tooling. Collaborate with developers and traders to support the live trading environment. Manage and optimize configuration and deployment pipelines across AWS and on-premise infrastructure. Implement observability and monitoring systems to enable rapid More ❯