Birmingham, West Midlands, United Kingdom Hybrid / WFH Options
Inspire People
team at the heart of the global economy! The Department for Business and Trade ('DBT') and Inspire People are partnering together to bring you an exciting opportunity for Senior SiteReliability Engineers to join a team that ensures DBT's digital services work as users expect, working with development teams giving them the tools for their job, including … service-level objectives. - Participate in an on-call rota (with allowance), helping to keep DBT services resilient and reliable. - Mentor junior engineers and contribute to the growth of the SRE function. Technologies you will work with include AWS, Azure, Terraform/CloudFormation, Docker, ECS, ECR, ElasticSearch, Python/Django, PostgreSQL (RDS), Redis, and more. Essential Criteria - Cloud experience with AWS … will be assessed against these requirements before being progressed to DBT. Shortlisted candidates will then be invited to interview and technical exercise. If you are a DevOps Engineer, SRE, or Systems Administrator looking to make a real impact across government digital services, apply today or contact Keesha Paulsen at Inspire People in confidence for more information. More ❯
cambridge, east anglia, united kingdom Hybrid / WFH Options
Speechmatics
The Role Speechmatics are seeking a SiteReliabilityEngineer (SRE) whose focus will be improving the reliability of our products, systems and infrastructure. You will work across teams to improve availability, scalability, performance and efficiency of our real-time AI inference APIs. You will get to work with high-scale GPU deployments spread across the world. … latency responses, making this is a really interesting problem space to learn about. What you'll be doing: Working with a diverse group of engineers across Speechmatics to improve reliability of our products and systems, from design through to operation in production. Taking part in incident response, postmortems and ensuring the same incident doesn't happen twice. Managing and More ❯
in San Francisco, our investors include Benchmark , General Catalyst , Peter Thiel , Adam D'Angelo , Larry Summers , and Jack Dorsey . Role Overview Position: SiteReliabilityEngineer (SRE) – Full-Time, San Francisco Commitment: 40 hours per week As an SRE at Mercor, you'll build and automate systems to keep our platform reliable, scalable, and fast. You will … work across every layer of the stack to drive measurable reliability improvements. Responsibilities Mentor engineers on best practices for observability, alert management, and instrumentation. Lead incident response from triage through post-mortem and remediation. Own and improve load-testing, disaster-recovery, and chaos-engineering programs. Automate reliability checks, capacity … planning, and service-level monitoring. Partner with product and platform teams to design for reliability and scalability from the start. Requirements/Qualifications Must-Have Qualifications Background in SRE Proficiency in Terraform, Python, Go Experience working with AWS Preferred Qualifications Experience with RDBMS (MySQL) Experience with document storage systems (MongoDB) Experience with caching systems (Redis) Exposure to data warehousing More ❯
of finance – and this is what #LifeAtBNY is all about. Join us and be part of something extraordinary. We’re seeking a future team member for the role of SRE/SiteReliabilityEngineer to join our Technology team. This role is located in Jersey City, NJ. In this role, you’ll make an impact in the … of services from inception through sustainment. Assist in creating and maintaining automation to improve reliability and velocity in addressing issues during regular maintenance tasks. Mentor engineers and champion SRE best practices, embedding a reliability-first culture and ensuring technical excellence across engineering teams. To be successful in this role, we’re seeking the following: Bachelor’s degree in … response and on-call support. Solid programming and scripting skills in languages like Python, Go, or Java, with a focus on automation, tooling, and system integration. Deep understanding of SRE principles, including SLAs, SLOs, error budgets, postmortems, and reliability-focused system design. Familiarity with automated testing, DevSecOps practices, CI/CD methods, performance engineering, and security controls. Strong collaboration More ❯
Production Support Engineer | SiteReliabilityEngineer – Market Making London | £200,000 – £300,000 (base + performance bonus) | 4 days in office We’re partnering with a leading market making firm seeking a highly skilled Production Support Engineer to join their London team. This is a hands-on role at the heart of a fast-paced … SQL Bring trading systems online and provide Tier 1 and Tier 2 operational support across trading sessions Streamline, automate, and enhance operational workflows and scripts to improve efficiency and reliability Collaborate closely with traders, quants, and developers to design new tools and refine existing systems Required Skills & Experience Strong Python programming skills, including data analysis with Pandas Advanced SQL More ❯
Production Support Engineer | SiteReliabilityEngineer – Market Making London | £200,000 – £300,000 (base + performance bonus) | 4 days in office We’re partnering with a leading market making firm seeking a highly skilled Production Support Engineer to join their London team. This is a hands-on role at the heart of a fast-paced … SQL Bring trading systems online and provide Tier 1 and Tier 2 operational support across trading sessions Streamline, automate, and enhance operational workflows and scripts to improve efficiency and reliability Collaborate closely with traders, quants, and developers to design new tools and refine existing systems Required Skills & Experience Strong Python programming skills, including data analysis with Pandas Advanced SQL More ❯
Production Support Engineer | SiteReliabilityEngineer – Market Making London | £200,000 – £300,000 (base + performance bonus) | 4 days in office We’re partnering with a leading market making firm seeking a highly skilled Production Support Engineer to join their London team. This is a hands-on role at the heart of a fast-paced … SQL Bring trading systems online and provide Tier 1 and Tier 2 operational support across trading sessions Streamline, automate, and enhance operational workflows and scripts to improve efficiency and reliability Collaborate closely with traders, quants, and developers to design new tools and refine existing systems Required Skills & Experience Strong Python programming skills, including data analysis with Pandas Advanced SQL More ❯
london (city of london), south east england, united kingdom
Bonhill Partners
Production Support Engineer | SiteReliabilityEngineer – Market Making London | £200,000 – £300,000 (base + performance bonus) | 4 days in office We’re partnering with a leading market making firm seeking a highly skilled Production Support Engineer to join their London team. This is a hands-on role at the heart of a fast-paced … SQL Bring trading systems online and provide Tier 1 and Tier 2 operational support across trading sessions Streamline, automate, and enhance operational workflows and scripts to improve efficiency and reliability Collaborate closely with traders, quants, and developers to design new tools and refine existing systems Required Skills & Experience Strong Python programming skills, including data analysis with Pandas Advanced SQL More ❯
Production Support Engineer | SiteReliabilityEngineer – Market Making London | £200,000 – £300,000 (base + performance bonus) | 4 days in office We’re partnering with a leading market making firm seeking a highly skilled Production Support Engineer to join their London team. This is a hands-on role at the heart of a fast-paced … SQL Bring trading systems online and provide Tier 1 and Tier 2 operational support across trading sessions Streamline, automate, and enhance operational workflows and scripts to improve efficiency and reliability Collaborate closely with traders, quants, and developers to design new tools and refine existing systems Required Skills & Experience Strong Python programming skills, including data analysis with Pandas Advanced SQL More ❯
markets interests you, this could be the perfect opportunity to take your career to the next level! About the role: You will play a crucial role in ensuring the reliability, performance, and efficiency the companies trading platforms. This is not your average DevOps role - this position focuses on sitereliability, where you'll be troubleshooting, supporting traders … support new trading systems, continuously improving the infrastructure. • Drive automation and operational excellence by leveraging your Linux expertise, Kubernetes, and Python scripting skills. • Monitor and ensure high availability and reliability of trading applications while being on top of system alerts and incidents. Key Requirements: • 1-5 years working experience • Background working in the financial services sector, ideally supporting traders … Solid experience with Linux Systems administration and troubleshooting. • Hands-on experience with Kubernetes for container orchestration. • Proficient in Python scripting for automation and system management. • A mindset focused on sitereliability and performance. • Strong troubleshooting skills and a proactive approach to problem-solving. Salary: Up to £90,000 base salary Lucrative bonus scheme Company perks/benefits Location More ❯
markets interests you, this could be the perfect opportunity to take your career to the next level! About the role: You will play a crucial role in ensuring the reliability, performance, and efficiency the companies trading platforms. This is not your average DevOps role - this position focuses on sitereliability, where you'll be troubleshooting, supporting traders … support new trading systems, continuously improving the infrastructure. • Drive automation and operational excellence by leveraging your Linux expertise, Kubernetes, and Python scripting skills. • Monitor and ensure high availability and reliability of trading applications while being on top of system alerts and incidents. Key Requirements: • 1-5 years working experience • Background working in the financial services sector, ideally supporting traders … Solid experience with Linux Systems administration and troubleshooting. • Hands-on experience with Kubernetes for container orchestration. • Proficient in Python scripting for automation and system management. • A mindset focused on sitereliability and performance. • Strong troubleshooting skills and a proactive approach to problem-solving. Salary: Up to £90,000 base salary Lucrative bonus scheme Company perks/benefits Location More ❯
london (city of london), south east england, united kingdom
Hamilton Barnes 🌳
markets interests you, this could be the perfect opportunity to take your career to the next level! About the role: You will play a crucial role in ensuring the reliability, performance, and efficiency the companies trading platforms. This is not your average DevOps role - this position focuses on sitereliability, where you'll be troubleshooting, supporting traders … support new trading systems, continuously improving the infrastructure. • Drive automation and operational excellence by leveraging your Linux expertise, Kubernetes, and Python scripting skills. • Monitor and ensure high availability and reliability of trading applications while being on top of system alerts and incidents. Key Requirements: • 1-5 years working experience • Background working in the financial services sector, ideally supporting traders … Solid experience with Linux Systems administration and troubleshooting. • Hands-on experience with Kubernetes for container orchestration. • Proficient in Python scripting for automation and system management. • A mindset focused on sitereliability and performance. • Strong troubleshooting skills and a proactive approach to problem-solving. Salary: Up to £90,000 base salary Lucrative bonus scheme Company perks/benefits Location More ❯
enabling large corporations to manage complex infrastructure projects, we provide exceptional service while staying at the forefront of cloud technology advancements. Role Description This is a full-time on-site role 3 days a week minimum in Kings Cross London. We are seeking a skilled SiteReliabilityEngineer with a strong focus on Google Cloud Platform … and respond to cloud incidents using incident.io, ensuring timely resolution. Use JIRA to log, track, and prioritize support tickets and workflow tasks. Monitor and maintain cloud infrastructure for performance, reliability, and security. Collaborate with teams to identify and implement solutions to technical challenges. Assist in deploying, configuring, and optimising GCP resources. Create and maintain documentation for troubleshooting processes and More ❯
enabling large corporations to manage complex infrastructure projects, we provide exceptional service while staying at the forefront of cloud technology advancements. Role Description This is a full-time on-site role 3 days a week minimum in Kings Cross London. We are seeking a skilled SiteReliabilityEngineer with a strong focus on Google Cloud Platform … and respond to cloud incidents using incident.io, ensuring timely resolution. Use JIRA to log, track, and prioritize support tickets and workflow tasks. Monitor and maintain cloud infrastructure for performance, reliability, and security. Collaborate with teams to identify and implement solutions to technical challenges. Assist in deploying, configuring, and optimising GCP resources. Create and maintain documentation for troubleshooting processes and More ❯
enabling large corporations to manage complex infrastructure projects, we provide exceptional service while staying at the forefront of cloud technology advancements. Role Description This is a full-time on-site role 3 days a week minimum in Kings Cross London. We are seeking a skilled SiteReliabilityEngineer with a strong focus on Google Cloud Platform … and respond to cloud incidents using incident.io, ensuring timely resolution. Use JIRA to log, track, and prioritize support tickets and workflow tasks. Monitor and maintain cloud infrastructure for performance, reliability, and security. Collaborate with teams to identify and implement solutions to technical challenges. Assist in deploying, configuring, and optimising GCP resources. Create and maintain documentation for troubleshooting processes and More ❯
enabling large corporations to manage complex infrastructure projects, we provide exceptional service while staying at the forefront of cloud technology advancements. Role Description This is a full-time on-site role 3 days a week minimum in Kings Cross London. We are seeking a skilled SiteReliabilityEngineer with a strong focus on Google Cloud Platform … and respond to cloud incidents using incident.io, ensuring timely resolution. Use JIRA to log, track, and prioritize support tickets and workflow tasks. Monitor and maintain cloud infrastructure for performance, reliability, and security. Collaborate with teams to identify and implement solutions to technical challenges. Assist in deploying, configuring, and optimising GCP resources. Create and maintain documentation for troubleshooting processes and More ❯
london (city of london), south east england, united kingdom
WALT Labs
enabling large corporations to manage complex infrastructure projects, we provide exceptional service while staying at the forefront of cloud technology advancements. Role Description This is a full-time on-site role 3 days a week minimum in Kings Cross London. We are seeking a skilled SiteReliabilityEngineer with a strong focus on Google Cloud Platform … and respond to cloud incidents using incident.io, ensuring timely resolution. Use JIRA to log, track, and prioritize support tickets and workflow tasks. Monitor and maintain cloud infrastructure for performance, reliability, and security. Collaborate with teams to identify and implement solutions to technical challenges. Assist in deploying, configuring, and optimising GCP resources. Create and maintain documentation for troubleshooting processes and More ❯
About the opportunity We are seeking a Senior SiteReliabilityEngineer to join the Platform Engineering Domain in the AI Platform Team. The mission of Platform Engineering is to provide trusted, performant, self-service platforms that empower product teams to build 'the bank the world loves to use.' The AI Platform team contributes to this mission by More ❯
Lead Site Relability Engineer – EdTech – AWS, Kubernetes, Terraform Oliver Bernard are currently working with an established EdTech, based in London, looking to expand their SRE function with a Lead level engineer. The incoming profile will have the chance to work on a variety of greenfield projects, and be able to help grow and scale their SRE practices whilst … competes heavily with their sizeable competitors. To be considered for this opening you’ll need at least 7-8 years’ experience, encompassing the following: Recent experience in a Lead SRE capacity, coaching/mentoring other engineers Hands-On Cloud experience with AWS and AWS Services Expert knowledge of Containerisation with Docker and Kubernetes Strong Infrastructure as Code experience with Terraform … Engineers, able to offer £80-90K, and operates a remote first model (with only quarterly visits required). Please apply here to register interest in this opportunity. Lead Site Relability Engineer – EdTech – AWS, Kubernetes, Terraform More ❯
City of London, London, United Kingdom Hybrid / WFH Options
Oliver Bernard
Lead Site Relability Engineer – EdTech – AWS, Kubernetes, Terraform Oliver Bernard are currently working with an established EdTech, based in London, looking to expand their SRE function with a Lead level engineer. The incoming profile will have the chance to work on a variety of greenfield projects, and be able to help grow and scale their SRE practices whilst … competes heavily with their sizeable competitors. To be considered for this opening you’ll need at least 7-8 years’ experience, encompassing the following: Recent experience in a Lead SRE capacity, coaching/mentoring other engineers Hands-On Cloud experience with AWS and AWS Services Expert knowledge of Containerisation with Docker and Kubernetes Strong Infrastructure as Code experience with Terraform … Engineers, able to offer £80-90K, and operates a remote first model (with only quarterly visits required). Please apply here to register interest in this opportunity. Lead Site Relability Engineer – EdTech – AWS, Kubernetes, Terraform More ❯
CD pipelines to automate deployments and improve workflow efficiency. Manage cloud infrastructure (AWS, Google Cloud Platform, Azure) using Infrastructure-as-Code (IaC) tools like Terraform and Ansible. Ensure system reliability by monitoring, troubleshooting, and improving application performance. Implement and track Service-Level Indicators (SLIs), Service-Level Objectives (SLOs), and Service-Level Agreements (SLAs Automate infrastructure management using scripting languages … bottlenecks. Ensure the security and compliance of cloud environments and deployments. Continuously improve automation, reliability, and performance of systems. REQUIRED SKILL SET: Hands on experience in a DevOps, SRE, or similar role.Strong knowledge of cloud platforms (AWS, Google Cloud Platform, Azure) and containerization (Docker, KubernetesExperience with Infrastructure-as-Code (Terraform, CloudFormation, AnsibleExpertise in CI/CD tools (Jenkins, GitLab … etc and version control (GitSolid understanding of monitoring and logging tools (Prometheus, Grafana, ELK StackProficiency in scripting (Python, Bash) and automation.Strong problem-solving skills with a focus on system reliability and performance.Knowledge of microservices architecture and distributed systems is a plus.Cloud certifications and experience with Agile methodologies are preferred. We are an equal opportunity employer. All aspects of employment More ❯
response. Preferred qualifications: Master's degree or PhD in Computer Science, or a related technical field. Experience as a cloud customer. About the job SiteReliability Engineering (SRE) combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. SRE ensures that Google Cloud's services-both our internally critical and our externally … an ever-watchful eye on our systems capacity and performance. Much of our software development focuses on optimizing existing systems, building infrastructure and eliminating work through automation. On the SRE team, you'll have the opportunity to manage the complex challenges of scale which are unique to Google Cloud, while using your expertise in coding, algorithms, complexity analysis and large … scale system design. SRE's culture of intellectual curiosity, problem solving and openness is key to its success. Our organization brings together people with a wide variety of backgrounds, experiences and perspectives. We encourage them to collaborate, think big and take risks in a blame-free environment. We promote self-direction to work on meaningful projects, while we also strive More ❯
SaaS platform provides businesses of all sizes with powerful, easy-to-use marketing automation tools, making first-party data accessible and actionable like never before. About the Team The SRE & Security Engineering team is the bedrock of trust for the 169,000+ businesses that rely on Klaviyo to power their growth. This team is responsible for the secure reliability, scalability, and performance of our entire platform. How You'll Make a Difference Architect and Lead: Define the vision, strategy, and roadmap for a unified SRE and Security Engineering organization. Champion Secure Reliability: Drive a 'secure and reliable by design' philosophy across all of engineering. Own Platform Integrity: Take ownership of the availability, latency, performance, efficiency, change management … done' mindset and seeing opportunity in every challenge. What You'll Need 10+ years of experience in software engineering, with at least 5+ years in a leadership role managing SRE, DevOps, or Security Engineering teams. Proven experience building and scaling engineering teams, ideally with experience establishing a new team or office. Deep, hands-on experience with cloud-native production systems More ❯
iO Associates are partnered with a growing SME in the Defence industry, currently looking for an experienced SRE to start with them as soon as possible. Rate: £550 per day (Outside IR35) Duration: Initial 3 months Location: Hereford - 3 days per week on site Clearance: MOD DV or an active SC with willingness to go through the DV process More ❯
My client, a successful quantitative investment manager, is looking for a Senior DevOps engineer to join their ML ops team and to implement testing, development, automation tools, and IT infrastructure for the ML platform team and its users. They are looking for a senior with 10+ years of experience and proficient in AWS and Terraform. Key Responsibilities: Implement testing More ❯