SiteReliabilityEngineer London (Blackfriars) – 7 monthsCertain Advantage are recruiting on behalf of our prestigious Financial Services client for an SREEngineer in their AWS DB team who support numerous native DBs like RDS/Aurora/Neptune plus CockroachDB.This is a contract position for 7 months working inside IR35.Lead SiteReliability Engineers (SRE … support is required outside of working hours Participate in enhancing product observability and telemetry, support modernization. Brainstorm ideas to simplify and streamline infrastructure by closely working with infrastructure and SRE teams. Required qualifications, capabilities and skills Knowledge of Python/Unix Shell scripting & SQL. Good understanding of development tools: source code control software, automated build, automated testing and JIRA. Understanding … of IaC infrastructure as a code concept is desirable. Experience with build automation, test driven development, continuous integration and delivery Experience with Relational and non Relational Databases Previous SRE experience including knowledge about SLO/SLA/SLI and error budgets, is advantageous Experience working or familiarity with one public cloud (AWS, Google or Azure) Preferred skills – what’ll get More ❯
our customer's systems are built and maintained. This role blends operational product support with software engineering to create applications to understand the overall health of our systems. The SRE team sits within a wider programme at the core of the customer mission. The role holder: As an SRE, fundamentally you will be doing work that has historically been done … engineering expertise to substitute automation for human labour, with the objective of limiting traditional manual operations work (incident tickets, on-call etc.) to no more than half of the SRE team's time (and aiming for considerably less). You will have an enthusiasm to learn and experiment, to develop tools to understand application health and improve their reliability … enable them to be scalable and resilient to failure, and how to get the best out of the infrastructure they are deployed to. Participating in the wider DevOps/SRE community within the organisation. Competancies It is desirable for you to have experience in the areas below. However more valued for this role is that you have excitement and enthusiasm More ❯
Location: London, England, United Kingdom Join Axon and be a Force for Good. As an SRE contributor in Axon's Real Time Operations organization, you are passionate about delivering solutions to the real-time problems our mission-critical cloud native services encounter. You are also obsessed about achieving the high quality and reliability our customers demand. You will work … You'll Do Location: London UK Build robust, easy-to-use foundational platforms and tools that enable engineering teams to provision services rapidly, consistently, and securely. Exemplify cloud-native sitereliability best practices. Write code that is performant, maintainable, clear, and concise. Employ strong problem-solving skills, with the ability to debug problems in cloud-native distributed systems. More ❯
SiteReliabilityEngineer (DV Security Clearance) Position Description CGI was recognised in the Sunday Times Best Places to Work List 2025 and has been named one of the 'World's Best Employers' by Forbes magazine. We offer a competitive salary, excellent pension, private healthcare, plus a share scheme (3.5% + 3.5% matching) which makes you a member … agencies most challenging problems. Our teams work alongside our clients to help them understand how to exploit technologies to maintain competitive advantage. Our systems are engineered for performance, security, reliability and scalability; built with modern CI and CD tooling and techniques. We are currently looking for an experienced cloud infrastructure engineer to join our team - being able to More ❯
Are you an experienced Senior DevOps/SiteReliabilityEngineer looking for your next contract role? Join one of the world's leading IT services, consulting, and business solutions organization. Founded in 1968, the company consistently ranks among the top global IT service providers. With a presence in over 50 countries, the company has built a reputation … across industries including banking, healthcare, telecommunications, and retail. The leading consultancy firm has partnered with a global technology leader and they are currently seeking an experienced Senior DevOps/SiteReliabilityEngineer to join the team. Additionally, this role provides a hybrid working arrangement based in London. Ready to make a move? Get in touch and apply More ❯
london, south east england, united kingdom Hybrid / WFH Options
Io Me
designed to help developers implement necessary business policies, such as meeting regulatory requirements. What The Role Involves As an experienced and visionary Head of SiteReliability Engineering (SRE), you will be responsible for leading the infrastructure and reliability strategy for Midnight, a regulatory-friendly blockchain focused on data protection, privacy, and freedom of expression. In this senior … you will own the reliability, scalability, and performance of the Midnight platform. You will be responsible for building and leading a high-performing team of SREs, driving the SRE roadmap, and partnering closely with engineering, security, and product teams to deliver robust production systems. You will be instrumental in setting the foundations of our infrastructure, designing systems that scale … while embracing the unique challenges of a blockchain-based architecture. This is a hands-on leadership role combining technical depth, architectural vision, operational rigor, and people leadership. Lead the SRE team, sharing expertise and best practices. Coach, mentor and develop SRE team. Demonstrate leadership in driving initiatives that enhance service reliability, scalability, and overall performance. Lead the entire lifecycle More ❯
Job Title: Cloud Engineer/SRE - Golang & Github Location: Remote - UK, London Salary/Rate: Up to £604 a day Inside IR35 Start Date: July 2025 Job Type: 12-Month Contract Company Introduction: We are seeking a highly skilled Cloud Engineer/SRE with Development experience in Go and Github to join our client in the Global Analytical … o GitHub Actions (designing complex workflows, custom actions) o GitHub Enterprise, Organisation and Repository settings. Operations/Infrastructure Background: Proven experience in an operations, sitereliability engineering (SRE), or infrastructure engineering role, with a strong appreciation for automation and stability. Modern SDLC Practices: Familiarity with: o Dependency management. o Security remediation processes and secure coding practices. o Testing More ❯
Hybrid position with on-calls We are seeking a highly motivated and skilled SiteReliabilityEngineer (SRE) to ensure the reliability, performance, and scalability of the client's critical Data Platform solutions. In this role, you will provide dedicated support and maintain the health of the data infrastructure. This position involves on-call responsibilities to address More ❯
consultants, analysts, and support staff. Overview: We are looking for a highly skilled and visionary leader to join our team as the Head of SiteReliability Engineering (SRE) with a strong focus on AWS cloud infrastructure. The ideal candidate will have a deep understanding of cloud architectures, extensive experience in SRE practices, and the ability to lead and … scale SRE teams to ensure the availability, performance, and security of our systems. Key Responsibilities: Leadership and Team Management: Lead and manage the SRE team to ensure high availability, scalability, and performance of our AWS-based infrastructure. Provide mentorship and guidance to junior and senior engineers, fostering a culture of operational excellence and continuous improvement. Cloud Infrastructure Management: Oversee the … design, implementation, and maintenance of cloud infrastructure in AWS, ensuring the systems are secure, reliable, and highly available. Use best practices for AWS services, automation, and monitoring. SRE Practices Implementation: Establish and lead the implementation of SRE principles, such as Service Level Objectives (SLOs), Service Level Indicators (SLIs), and Error Budgets, to drive the team's focus on reliability. Incident More ❯
consultants, analysts, and support staff. Overview: We are looking for a highly skilled and visionary leader to join our team as the Head of SiteReliability Engineering (SRE) with a strong focus on AWS cloud infrastructure. The ideal candidate will have a deep understanding of cloud architectures, extensive experience in SRE practices, and the ability to lead and … scale SRE teams to ensure the availability, performance, and security of our systems. Key Responsibilities: Leadership and Team Management: Lead and manage the SRE team to ensure high availability, scalability, and performance of our AWS-based infrastructure. Provide mentorship and guidance to junior and senior engineers, fostering a culture of operational excellence and continuous improvement. Cloud Infrastructure Management: Oversee the … design, implementation, and maintenance of cloud infrastructure in AWS, ensuring the systems are secure, reliable, and highly available. Use best practices for AWS services, automation, and monitoring. SRE Practices Implementation: Establish and lead the implementation of SRE principles, such as Service Level Objectives (SLOs), Service Level Indicators (SLIs), and Error Budgets, to drive the team's focus on reliability. Incident More ❯
and future states of the organisation and make faster, more informed decisions. The company is headquartered in London, with offices in Philadelphia, The Hague, Toronto, and Sydney. Role: Principal SiteReliabilityEngineer You will be a senior technical leader focused on scaling and hardening our AWS- and Kubernetes-based infrastructure. You will collaborate across product, platform, and … expertise, excellent communication skills, and a collaborative spirit. Responsibilities: Define and enforce SLOs, SLIs, and error budgets across critical services Develop and implement cloud infrastructure and tooling strategies Enhance SRE practices across the organization Implement robust observability metrics, logs, and traces using our observability tools Guide the team in building automated, self-healing systems Own and evolve incident response processes … security, DevOps, and software teams to ensure compliance and operational excellence Evaluate and adopt tools and practices to improve platform performance and reliability Desired Skills & Experience: Experience leading SRE transformations Hands-on expertise with Kubernetes (EKS preferred) in production Strong experience with AWS core services (EC2, EKS, RDS, S3, ALB/NLB, IAM, CloudWatch, etc.) Proficiency in Infrastructure as More ❯
sector, our technology is truly flexible and designed to transform any business at scale. We've created a unified platform that adapts to diverse needs, offering the scalability and reliability legacy systems simply can't match. At ZILO, our DNA is built on Character, Creativity, and Craftsmanship. We face every challenge with integrity, explore new ideas with a curious … If you're ready to shape the future, let's talk. About the Role We're looking for a Senior SiteReliabilityEngineer to join our SRE team. This is a hybrid role that blends deep platform engineering with application-level troubleshooting . You'll be responsible for the stability, performance, and resilience of our cloud-native … service code Resolve incidents and support root causes (Java and GoLang services) Contribute to postmortems and reliability engineering initiatives Who You Are Essential Experience 5+ years in an SRE, DevOps, or infrastructure role Deep hands-on experience with AWS , EKS/Kubernetes , and Terraform Working knowledge of Kafka tuning, monitoring, and operational troubleshooting Strong familiarity to be able to More ❯
developer experience to go with it. The tools used on the team include Elixir, Phoenix, Kubernetes and Google Cloud Platform. SiteReliability Engineering at Duffel As an SRE at Duffel, you'll be part of a small team within engineering that is responsible for the reliability, performance, and resilience of our infrastructure and applications. You will be … silently drop spans. - An enthusiasm for both software development and systems engineering. - A high bar for code and configuration quality and readability. - A good understanding of current observability and reliability practices. - Experienced and comfortable in running incident response. - Big picture thinking - you can make trade offs on technical work streams against business impact. - Fantastic communication skills. You're able … We manage a data pipeline using Pub/Sub, Airbyte, and dbt. Our Current Focus We're currently driving a big shift in how we think about and monitor reliability across the engineering organisation, with a focus on early detection of customer-impacting issues. We're extending and standardising our use of OpenTelemetry, and introducing Honeycomb as the single More ❯
team of passionate thinkers, innovators, and dreamers - and help us connect people and build communities to create economic opportunity for all. About the team and the role: As a SiteReliabilityEngineer at eBay, you'll play a key role in managing major incidents and the overall health of our services, making sure they are both resilient … and high-performing. You'll create strategies for availability and reliability, enhance domain ecosystem observability, and support a shift toward a more engineering-focused culture. Your contributions will ensure that eBay's technology remains cutting-edge and reliable for our global community. What you will accomplish: Proactive Monitoring : Continuously monitor the health of eBay's critical services to identify … and address potential issues before they escalate. Solution Development : Collaborate with Architecture, Engineering, and Operations teams to develop solutions that ensure high site availability, reliability and performance. Collaborative Problem Solving : Work closely with partner teams to resolve recurring technical issues, onboard new alerts, and develop high-quality Standard Operating Procedures (SOPs). Automation and Process Enhancement : Identify and More ❯
London, South East, England, United Kingdom Hybrid / WFH Options
BOSS Professional Services LTD
SREEngineer Full-time UK - Remote/Hybrid My client is a high growth ecommerce business which runs it technology stack on AWS. Due to the nature of the business the SREEngineer will need to support sudden peaks in traffic smoothly scaling. They also host other ecommerce platform for other brands which also need supporting. As an … SREEngineer you will maintain a scalable and reliable production environment for running software services while helping grow the customer base and product offering. For the SREEngineer role we are seeking: Technology stack: Kubernetes, MySQL, PostgreSQL, PHP, Python, Docker, AWS Lambda, AWS, Redis, ELK, monitoring: Prometheus, Grafana or Loki You have previous experience of working within SRE … Assist and support the DevOps engineers: setting up the infrastructure for microservices Work closely with rest of the DevOps and QA team to load test applications Responsibilities for the SREEngineer include: Create sustainable systems and services through automation and uplifts Partner with development teams to improve services Gather and analyse metrics from both operating systems and applications Participate More ❯
and known for consistent success and impressive profitability. With continued growth across the firm, they are now looking to expand their world-class engineering team by hiring an experienced SiteReliabilityEngineer to help design, optimise and maintain their global trading infrastructure. (FYI: the base salary advertised does not include cash bonuses, paid bi-annually. Your total More ❯
Global SiteReliabilityEngineer Location: London About Us Founded in 2013, GSR is a leading market maker and programmatic trading firm in the fast-evolving world of cryptocurrency trading. With over 200 employees across seven countries, we provide billions of dollars in liquidity daily to cryptocurrency protocols and exchanges. We build long-term relationships with crypto communities … GSR is an opportunity to be deeply embedded in every major sector of the cryptocurrency ecosystem. About the Role We are seeking a SiteReliabilityEngineer (SRE) to design, optimize, and support highly available systems across our global trading infrastructure. As part of GSR's SRE team, you will manage a multi-regional cloud environment while integrating … work across all layers of infrastructure, including: Networking & Exchange Connectivity Linux Systems & Kubernetes Administration Microservice Orchestration & Observability Disaster Recovery & Security Optimization Your mission is to improve latency, scalability, and reliability, ensuring GSR remains a best-in-class market maker. We value engineers who drive automation, reduce friction, and enhance developer velocity through better tooling, CI/CD, and infrastructure More ❯
impact. We value continuous learning, personal growth, and providing our team with resources to succeed. Ready to shape the future? Let's talk. We're looking for a seasoned SRE with a front-end focus, expert in React applications, to join our SRE team. In this role, you'll ensure the reliability, performance, and operability of our React-based … invalidation, HTTP caching headers) to reduce latency and origin load. Collaborate with UX teams to balance feature richness with performance targets. Collaboration & Knowledge Sharing Serve as the React/SRE subject-matter expert: mentor engineers on best practices for building resilient front-ends. Produce and maintain runbooks, debugging guides, and incident-playbooks specific to client-side failures. Partner closely with … wider backend SRE, DevOps, and product teams to ensure end-to-end reliability. Enhanced leave - 38 days inclusive of 8 UK Public Holidays. Private Health Care including family cover. Life Assurance - 5x salary. Flexible working - work from home and/or in our London Office. Employee Assistance Program. Company Pension (Salary Sacrifice options available). Access to training and development. More ❯
who have attracted talent from rival hedge funds and big tech firms alike, due to their sophisticated tech infrastructure and great work life balance. They're looking for a (SRE) SiteReliabilityEngineer to come and join the Infra team and act as a SME within Cloud, Automation and DevOps. The role would entail helping to streamline … of trading/research applications into production. Stack: Python, AWS, Kubernetes, Linux The company is open to people outside of finance, the emphasis is for an expert and passionate SRE, who can bring fresh perspectives on automation and scalability to the firm. If you're keen to find out more, please do apply More ❯
who have attracted talent from rival hedge funds and big tech firms alike, due to their sophisticated tech infrastructure and great work life balance. They're looking for a (SRE) SiteReliabilityEngineer to come and join the Infra team and act as a SME within Cloud, Automation and DevOps. The role would entail helping to streamline … of trading/research applications into production. Stack: Python, AWS, Kubernetes, Linux The company is open to people outside of finance, the emphasis is for an expert and passionate SRE, who can bring fresh perspectives on automation and scalability to the firm. If you're keen to find out more, please do apply More ❯
most exciting products in Microsoft Azure, passionate about exceeding customer expectations and advancing Microsoft's cloud first strategy? Azure Customer Experience (CXP) team is searching for a customer obsessed SiteReliabilityEngineer to work on a HPC environment, that can drive reliability engineering excellence and embody our culture of inclusiveness, growth-mindset, and unwavering dedication to … with access to cutting edge technology surrounded by world-class engineers. Qualifications In-depth technical experience in software engineering, network engineering, or systems administration Operational experience in improving Service Reliability, Availability and Performance Ability to deal with the ambiguity associated with working in a fast-paced environment Systematic problem-solving approach, coupled with effective communication skills and a sense … pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter. UK Baseline Personnel Security Standards; UK Security Clearance Responsibilities Collaborating closely with the existing SRE teams on building and enhancing tooling and automation solutions for faster resolution of issues impacting SLO's and averting incidents altogether when possible. Collaborating with the customers to understand their More ❯
application performance - identifying, and implementing, improvements to application performance and stability. Collaborate with the design and implementation of the desired pipelines and process for deployment to production environment. The SRE will work closely with Platform and Software domains to ensure continuous improvement of performance and stability whilst adhering to standards. Undertake ad-hoc projects and other activities as required. Key … Accountabilities and Activities Contribute to the SRE function including: Drive evolution of the DevOps/GitOps toolchain, promoting improvements to streamline the software delivery process and showing improvements through metrics. Accountable for halting or stopping a project/product if the solution is not technically acceptable. Responsible for producing and maintaining documentation relating to application design, integration processes, testing procedures … to create operational run and playbooks. Integration with Domains including: Collaborating with Domains to plan, design, test and maintain the application. Design patterns for any component or structure under SRE responsibility. Implementation of components such as Monitoring and Logging. Manage the runbook preparations of Domains. Liaise and support other teams on work items including: Developing, refining, and tuning integrations between More ❯
Has anyone actually ever given you a good description of what SRE is? Recently I've met dozens of companies implementing an SRE function. Half are just rebranding an ops team (because Ops ain't cool), some don't want to call the additional silo they have created 'DevOps' (because apparently that's the wrong thing to do) so they … re calling it SRE and the rest actually don't really know how to describe what they're doing. And if you can't describe it simply, you don't know what it is, chief (because Google do it, isn't the right answer). That was until today, when I met a company who actually white boarded their vision … process rather than the build. We discussed Kubernetes, Prometheus and API Gateways. Most importantly, they spoke like they knew what the hell they were on about. Not just about SRE, but on the whole Engineering process. This is a company with at the top of their game, who are about to introduce a brand new monitisation model to a web More ❯
flexible remoteworking locations within UK/Europe) Employment type: Permanent Working Hours: Full time (9-6 UK) Salary: Up to £110K + Shares + Benefits TransFICC is hiring a SiteReliabilityEngineer to provide high-performance services to our customers. We develop an integration service … product that enables our clients to have a flexible, hosted service without requiring their internal resources to respond to connectivity challenges across trading venues. You will be joining our SRE team and contributing to TransFICC's automation culture. We are a multi-disciplinary team covering everything from desktop and laptop support to data centre provisioning of servers and vendor network … automated, so having experience with a software automation tool like Ansible and coding ability is a must. We are looking for someone experienced as a sys admin or network engineer; however, you must have a reasonable understanding of both. Constructive, open-minded and self-motivated. A belief in life learning, and an awareness of how much there still is More ❯
mission, and comprehensive benefits. Your Mission Provide self-service cloud-native products for delivery teams while matching business requirements such as security, compliance, cost and reliability. As a Senior SRE, you will: Take part in the design, development, deployment and management of infrastructure products Evangelize the best practices around observability, reliability, security and performance Help the company grow faster More ❯