|
1 to 25 of 42 Permanent Site Reliability Engineering Jobs in Cheltenham
cheltenham, south west england, united kingdom SS&C Technologies
insurance companies, retirement providers, and wealth management platforms. Job Overview As the Head of Production Engineering and Site Reliability Engineering ( SRE) for the GIDS organisation, you will lead a team responsible for the scalability, resilience, performance, and reliability of cloud and hybrid infrastructure powering some … that proactively address issues before they impact clients. Key Responsibilities: Leadership & Strategy Define and execute the vision and roadmap for Production Engineering and SRE within GIDS. Build and lead globally distributed, high-performance teams with a focus on talent development, SRE culture, and operational excellence. Collaborate cross-functionally with … including tooling, automation, and shift rotation planning. Qualifications Required: 10+ years of experience in engineering, with 5+ years in a leadership role in SRE, DevOps, or Production Engineering. Proven track record managing reliable, scalable systems in a high-compliance environment (e.g., FinTech, HealthTech). Strong understanding of modern software More ❯
cheltenham, south west england, united kingdom Hybrid / WFH Options Maxwell Bond
ensuring system reliability, scalability, and performance across both AWS and Azure environments. This is your opportunity to lead cloud-native transformation and embed SRE best practices into engineering at scale. What you’ll be doing as their Site Reliability Engineer: You’ll be the go-to … reduce toil and accelerate deployment frequency. Build observability into everything—own monitoring, alerting, and incident response to minimize MTTR and improve system health. Champion SRE culture and reliability-focused engineering—help shape sustainable engineering practices, SLAs, SLOs, and error budgets. Contribute across the stack with flexibility in … leave + bank holidays R&D and personal training budgets And much more... This is an incredibly rare chance for a seasoned, high-performing SRE to leave your mark on high-impact transformation projects in a business that’s truly committed to doing things the right way. Trust me. You More ❯
cheltenham, south west england, United Kingdom Ranger Technical Resources
Site Reliability Engineer #2494 Position Summary: Our partner, an innovative PaaS company specializing in remote monitoring and network management solutions, is looking for a Site Reliability Engineer to help ensure the critical infrastructure and applications' reliability, scalability, and performance. In this role, you’ll build … Bachelor's or higher degree in Computer Science, Information Systems, Information Technology, or a related technical field/experience. 7+ years of experience in Site Reliability Engineering, DevOps, Infrastructure, or related roles. Deep understanding of AWS and its various modules and services. Strong background in Linux administration More ❯
cheltenham, south west england, United Kingdom Ubique Systems
Greetings from Ubique Systems!!! We are looking for a Site Reliability Engineer for one of our customer who has an expertise in:- Primary Skills or Mandatory Skills 5+ years of experience in Java development with a … good understanding of backend systems Proven experience in cloud technologies (AWS, GCP, or Azure). Strong understanding of Site Reliability Engineering ( SRE) practices and principles. Experience with observability and monitoring tools such as Prometheus, Grafana, ELK, Splunk, or Datadog. Familiarity with containerization (Docker, Kubernetes) and infrastructure as More ❯
cheltenham, south west england, united kingdom Hybrid / WFH Options TECEZE
Job Title: Site Reliability Engineer Location: Hybrid Remote – London EC2M Contract (12 months) Outside IR35 … About the Role: We are partnering with one of the top companies in the mobile industry to hire a Site Reliability Engineer ( SRE) . In this role, you will collaborate with cross-functional teams to drive the design, development, and delivery of high-performing, scalable, and reliable infrastructure … . Deep understanding of Linux internals, standard networking protocols, and distributed systems architecture. Hands-on experience with automation and performance optimisation. Strong knowledge of SRE principles and methodologies. Experience with observability tools and telemetry systems. Exposure to Google Cloud Platform (GCP). Familiarity with hybrid or multi-cloud architecture. Experience More ❯
cheltenham, south west england, united kingdom Cipher7
Job Title: Senior Site Reliability Engineer ( SRE) Location: London, UK – Onsite (5 days/week) Employment Type: Permanent Salary: Up to £80,000 per annum (Gross) About the Role: We are seeking a highly skilled and motivated Site Reliability Engineer ( SRE) to join our London-based … team. This role is ideal for someone passionate about service reliability, scalability, and performance. As an SRE, you will collaborate with development and operations teams to automate infrastructure, enhance observability, and reduce manual processes (TOIL) to improve overall system health. Key Responsibilities: Design, build, and maintain scalable, resilient systems … Bachelor’s degree in Computer Science, Engineering, or a related technical field (or equivalent practical experience). 8+ years of relevant experience in SRE, DevOps, or Infrastructure Engineering roles. More ❯
cheltenham, south west england, United Kingdom Hybrid / WFH Options Stealth iT Consulting
Site Reliability Engineer ( SRE) Global Digital Consultancy Salary: Up to £55k + benefits Sponsorship won't be provided for this opportunity. Hybrid remote – Occasional travel to Manchester, London, or Glasgow A leading global consultancy, with ambitious plans to grow its Digital teams throughout 2025, is seeking a Site Reliability Engineer ( SRE) to support multiple new and ongoing projects. Important: This role may require occasional out-of-hours support or on call based on client needs. Please apply only if you are comfortable and available for this. Desired Skills and Experience: Active SC Clearance or eligible for … SC Clearance. Strong understanding of the SRE mindset and principles, including the creation and management of Service Level Indicators (SLIs) and Service Level Objectives (SLOs), ensuring reliability and performance. An understanding of Microservices & container orchestration Strong Observability & Monitoring experience (preferably tools such as Dynatrace, Prometheus or OpenTelemetry) Experience delivering More ❯
cheltenham, south west england, United Kingdom Hybrid / WFH Options Uniting Cloud
Site Reliability Engineer ( SRE) Remote (UK) £85,000 – £105,000 (DoE) We’re a growing FinTech scale-up and we’re on the lookout for an experienced Site Reliability Engineer to join our remote-first engineering team. Things are moving fast here, and as we … continue to grow; reliability, automation, and scalability have never been more important to us. You will be our first SRE so a strong background in implementing SRE best practices would be Ideal. You will know what good looks like and strive to continuously improve automation, availability and resilience. This … tooling using AWS, Terraform, Docker, and CI/CD pipelines. Supporting and evolving our container-based architecture (we use ECS and Fargate). Driving SRE best practices: SLIs/SLOs, error budgets, reducing toil, and improving observability. Using (and hopefully enjoying!) tools like Datadog, Prometheus, Grafana, and Nix to support More ❯
cheltenham, south west england, United Kingdom Hybrid / WFH Options Halian
Halian Technology is seeking an experienced Site Reliability Engineer for a full-time opportunity within our client’s Platform Engineering team, based remotely in the U.S. We’re looking for a technically skilled and automation-driven individual with strong experience in cloud infrastructure, and observability tools to … help scale our client’s services to millions of endpoints globally. This is an exciting opportunity to work at the core of platform reliability and infrastructure automation within a fast-growing SaaS company. Key Responsibilities: Diagnose and resolve complex application and infrastructure issues across distributed systems. Participate in 24x7 … using tools like New Relic, DataDog, or Splunk. Influence design decisions to ensure scalable, secure architecture and high availability. Key Requirements: 5+ years in Site Reliability Engineering and/or DevOps roles. Strong Linux administration and scripting skills. Hands-on experience with AWS core services (EC2, ECS More ❯
cheltenham, south west england, united kingdom DailyPay Inc
Press Center. The Role: DailyPay is looking for a talented and motivated engineer with 4+ years of experience as a professional software engineer or site reliability engineer. You will be a senior member and a technical leader of our Site Reliability Engineering team. You will … and be an advocate for operational and engineering excellence. How You Will Make an Impact: You will be a key contributor to our SRE team. You will tackle a wide variety of technical problems, providing solutions and tooling to product development teams enabling them to monitor and improve their … systems. You will provide advice and support to product development teams, being an advocate for operational and engineering excellence. You will mentor and guide junior SREs as they develop their skills. What You Bring to The Team: 4+ years experience designing, developing and scaling complex services Proficient coding skills More ❯
cheltenham, south west england, united kingdom Hybrid / WFH Options Spectrum IT Recruitment
Site Reliability Engineer Southampton HQ - 2 Times a week in Office Cloud, SaaS, AWS, Please be advised Security Clearance is required for this position We are working alongside one of our longstanding clients in helping them recruit a Site Reliability Engineer. The company deliver cutting-edge … credentials Do You Have What It Takes? 3-6 years of hands-on experience in a similar role, with a strong emphasis on systems engineering, automation, and service reliability Proficient in at least one programming language such as Python, Go, Java, or C#, along with scripting skills in … or PowerShell Solid grasp of cloud platforms like AWS, including an understanding of how core services like EC2, ECS, Lambda, and DynamoDB operate under reliability constraints Practical experience using infrastructure-as-code tools like CloudFormation or Terraform In-depth knowledge of CI/CD principles and hands-on experience More ❯
cheltenham, south west england, united kingdom Hybrid / WFH Options Ocho
We are seeking a Site Reliability Engineer ( SRE) to join an innovative and fast-growing company in Belfast. This role focuses on ensuring the reliability, scalability, and performance of critical infrastructure and services while working with cutting-edge cloud-native technologies. You'll collaborate with engineering … resilient, high-performance infrastructure. Develop and maintain CI/CD pipelines to enhance deployment efficiency. Implement cloud-native and open-source solutions to improve reliability and scalability. Proactively monitor and troubleshoot production systems, ensuring uptime and performance. Automate infrastructure provisioning and configuration using Infrastructure as Code (IaC). Drive … experience (DevEx) and operational efficiency. Participate in incident response, root cause analysis, and post-mortem reviews. What You'll Need 3+ years in an SRE, DevOps, or Infrastructure Engineering role in a high-scale environment. Strong experience with Kubernetes and container orchestration. Deep knowledge of cloud platforms and distributed More ❯
cheltenham, south west england, United Kingdom Hybrid / WFH Options Harrington Starr
Site Reliability Engineer – Fintech Up to £85,000 | Fully Remote (UK Only) We’re working with a forward-thinking technology company that’s helping to transform how global financial transactions are monitored and managed. Their platform is used by some of the world’s leading financial institutions to … streamline international payments and ensure compliance at scale - all through smart automation and modern cloud-native infrastructure. They’re looking to bring on a Site Reliability Engineer with deep experience in observability . If you’ve worked with tools like Prometheus in AWS , supported development teams with tracing … and reducing operational noise Working with AWS (EKS, EC2, Lambda, RDS), Terraform, and CI/CD tools What They’re Looking For: Experience in SRE or DevOps roles in a production environment Strong knowledge of observability tools , especially Prometheus in AWS Experience with tracing , metrics, and logs to support development More ❯
cheltenham, south west england, United Kingdom Hybrid / WFH Options Durlston Partners
Senior Site Reliability Engineer | Remote (EU/UK) | High-Performance Trading A leading trading firm operating at scale in the digital asset space is hiring a Senior Site Reliability Engineer to help scale, secure, and optimise its global trading infrastructure. This is a remote-first role More ❯
cheltenham, south west england, united kingdom TP ICAP Group Services Ltd
hands-on support experience within a financial institution (buy-side, sell-side, venue/platform provider) Experience with Site Reliability Engineering ( SRE) practices, including monitoring, incident response, and post-mortem analysis Hands-on experience with containerization technologies such as Docker and Kubernetes Proven experience managing cloud-based More ❯
cheltenham, south west england, united kingdom Hybrid / WFH Options talego
million 3 year technology & digital programme and this is definitely the time to be joining the journey. About the role As the new Devops Engineering Manager, you'll be responsible for building from scratch, a high-performing team of Platform Engineers. Your mission? To orchestrate and evolve the core … Define SLIs and SLOs across latency, availability, and throughput, aligning internal goals with platform performance. Promote and embed Site Reliability Engineering ( SRE) practices to improve stability, monitoring, and response. Manage a growing toolset for orchestration, observability, and automation. Partner closely with Engineering, Delivery, and Architecture teams … relationships — advocating for platform goals and aligning with business objectives. What we’re looking for: Strong knowledge of modern platform management practices (DevOps, Agile, SRE, ITIL). Experience with Azure cloud services, including resource management, networking, and compute. Proficiency in Azure DevOps, including CI/CD pipelines, Azure Boards, and More ❯
cheltenham, south west england, united kingdom Stealth iT Consulting
Profile We’re looking for individuals with the following background and skills: Experience in lean process design and IT/business automation across product, engineering, and operations teams. Confidence in client-facing and advisory roles, with a strong foundation in agile transformation. Ability to define and measure OKRs/… processes for speed and quality, and drive product-focused operating models. Proficiency in DevOps, CI/CD, DevSecOps, Site Reliability Engineering ( SRE), developer experience, observability, and hybrid/multi-cloud environments. Bonus: Familiarity with observability platforms and practical experience working with development teams to enhance monitoring and More ❯
cheltenham, south west england, united kingdom Hybrid / WFH Options MCS Group
is proud to be working with a multi-national software development organisation as they seek to expand their team with the addition of a Site Reliability Engineer on a permanent basis to work remotely. (Fully Remote/Belfast) The Role Design, implement, and optimise AWS cloud infrastructure to … e.g. Python, Bash Strong knowledge of Linux/Windows Commercial experience with Docker/Kubernetes, and containerisation. Elasticsearch experience is a plus! Understanding of SRE, DevOps, and DevSecOps methodologies. Strong problem-solving skills, attention to detail, and the ability to work autonomously. Full right to work in the UK. The … have others that are. Please visit MCS Group to view a wide selection of our current and exclusive roles Skills: AWS LINUX UNIX Devops SRE Benefits: Work From Home Pension Health Parking More ❯
cheltenham, south west england, United Kingdom Trust In SODA
Love solving gnarly problems in AI infrastructure? Our client is building the AI Native GPU Cloud—and we need a senior HPC Site Reliability Engineer to keep it humming. You’ll own the reliability and performance of our cutting-edge Nvidia-based HPC systems. Think DGX clusters … the chance to shape our infrastructure from the ground up. Expect high-impact work, loads of autonomy, and collaboration with smart folks across architecture, engineering, and ops. You’ll: Set up and optimize HPC clusters and networks (think DGX, HGX, GPU Direct) Debug low-level networking issues with Cisco … This role is perfect if you: Have 6+ years in HPC or networking-heavy roles Know BGP, EVPN, VxLAN, RDMA inside and out Have SRE experience in high-stakes environments Love solving infra puzzles at scale Bonus points for CCIE/JNCIS, InfiniBand, or cloud/HPC interconnect experience. Sound More ❯
cheltenham, south west england, United Kingdom Hybrid / WFH Options Propel
dynamic, VC-backed startup that is revolutionising risk and compliance management in electronic communications using AI and ML! We're looking for a Senior Site Reliability Engineer who is deeply passionate about technology and looking to play a pivotal role in ensuring the availability, security, and efficiency of More ❯
cheltenham, south west england, United Kingdom Lorien
Visualisation skills with PowerBI, other Automation and Metrics knowledge handy. Proficiency with tools like Jira, Confluence, Excel, and SharePoint Familiarity with Agile, DevOps, and Site Reliability Engineering Excellent communication and stakeholder management skills More ❯
cheltenham, south west england, United Kingdom Hybrid / WFH Options Digital Waffle
years. What you’ll do: Implement, test and deploy Azure Data Factory (ADF) pipeline definitions within version control to customer environments. Work with our Site Reliability Engineering team to ensure your solutions are observable, reliable and performant. Work with our software implementation consultants (SICs) to define and … verify specification documents for ETL process. Work with customer IT to test customer data source endpoints to ensure they meet specification. Work with our Engineering teams to ensure end-to-end capability for integrated data. Support cutover to production systems (can be outside normal working hours). Identify improvements More ❯
cheltenham, south west england, United Kingdom TechShack
SRE Opportunity – Kubernetes | AWS or Azure | Remote- 100-115k- UK BASED ONLY We're hiring for a scaling tech business who currently have a project in Azure and AWS so candidate from both backgrounds are welcome. In this role, you'll be: Planning and securely deploying infrastructure into new … influencing tooling decisions The Stack: AWS or Azure, Kubernetes, Docker, Terraform, Python, Security-focused tooling You should apply if you have: 4+ years in SRE/DevOps or relevant engineering roles Hands-on experience with AWS or Azure A track record of deploying into new regions Commercial experience managing … Kubernetes clusters Strong communication skills 🚀 DevOps/ SRE Opportunity – Kubernetes | AWS or Azure | Remote-- 100-115k-UK BASED ONLY More ❯
cheltenham, south west england, United Kingdom Atarus
team, working on some of the most performance-critical cloud systems in the industry. This is a unique opportunity to combine enterprise-grade cloud engineering with the fast-paced demands of live global events , supporting an AWS and Kubernetes platform across multiple sites. 🧩 What You’ll Be Doing Lead … on key infrastructure projects Collaborate with InfoSec teams to ensure cloud compliance and governance Support infrastructure during global events — both remotely and occasionally on- site Partner with DevOps, development, and platform teams to deliver resilient, scalable systems Be available for occasional work outside normal business hours (planned and unplanned … Solid grasp of hybrid networking: Direct Connect, Transit Gateway, VPC Peering Experience working as a 3rd line engineer or Site Reliability Engineer ( SRE) Strong communication skills and ability to remain calm under pressure Understanding of low-level compute (CPU, memory, storage) UK driving licence required (for travel to More ❯
cheltenham, south west england, United Kingdom DNSINFOLTD
Job Description: As an SRE, you'll collaborate closely with Application Development and Operations teams to build and maintain scalable systems. Your core focus will be to automate processes and ensure the highest levels of service reliability, specifically by reducing manual effort (TOIL). You'll bring a strong … passion for continually improving the reliability, availability, and performance of our services.. Primary Skill – Experience with cloud platforms Primarily in AWS Cloud (e.g., AWS, GCP, Azure) and Container Orchestration (e.g., Kubernetes, Docker). Proficiency in Monitoring and Logging Tools: Datadog, Splunk, Dynatrace, AppDynamics, Prometheus, Grafana, ELK Stack (Elasticsearch, Logstash More ❯
|
Salary Guide Site Reliability Engineering Cheltenham - 25th Percentile
- £101,250
- Median
- £107,500
- 75th Percentile
- £113,750
|