Altrincham, Greater Manchester, United Kingdom Hybrid / WFH Options
Future Talent Group
SiteReliabilityEngineer – FinTech/Global Payments – London HQ/Remote First Salary - £80,000/£85,000 + Bonus Location - This UK-based team offers a fully remote working option, with a headquarters in Central London. In this role, you will be joining a leading SaaS … market. The business aims to scale its platform significantly over the next few years to support a growing international client base. Responsibilities Champion core SRE practices: define SLIs/SLOs/SLAs, reduce toil through automation, and plan for Disaster Recovery. Refine KPIs to support data-driven decisions around reliability … teams to build resilient, observable, and maintainable features. Promote DevOps culture by leading knowledge-sharing sessions and supporting issue resolution. Skills Strong grounding in SRE principles and operational best practices. Proficient with observability tools (Prometheus, Grafana, OTEL, Cloudwatch) and telemetry pipelines. Solid programming skills in Python and/or Go More ❯
london, south east england, united kingdom Hybrid / WFH Options
Future Talent Group
SiteReliabilityEngineer – FinTech/Global Payments – London HQ/Remote First Salary - £80,000/£85,000 + Bonus Location - This UK-based team offers a fully remote working option, with a headquarters in Central London. In this role, you will be joining a leading SaaS … market. The business aims to scale its platform significantly over the next few years to support a growing international client base. Responsibilities Champion core SRE practices: define SLIs/SLOs/SLAs, reduce toil through automation, and plan for Disaster Recovery. Refine KPIs to support data-driven decisions around reliability … teams to build resilient, observable, and maintainable features. Promote DevOps culture by leading knowledge-sharing sessions and supporting issue resolution. Skills Strong grounding in SRE principles and operational best practices. Proficient with observability tools (Prometheus, Grafana, OTEL, Cloudwatch) and telemetry pipelines. Solid programming skills in Python and/or Go More ❯
Bury, east anglia, united kingdom Hybrid / WFH Options
Future Talent Group
SiteReliabilityEngineer – FinTech/Global Payments – London HQ/Remote First Salary - £80,000/£85,000 + Bonus Location - This UK-based team offers a fully remote working option, with a headquarters in Central London. In this role, you will be joining a leading SaaS … market. The business aims to scale its platform significantly over the next few years to support a growing international client base. Responsibilities Champion core SRE practices: define SLIs/SLOs/SLAs, reduce toil through automation, and plan for Disaster Recovery. Refine KPIs to support data-driven decisions around reliability … teams to build resilient, observable, and maintainable features. Promote DevOps culture by leading knowledge-sharing sessions and supporting issue resolution. Skills Strong grounding in SRE principles and operational best practices. Proficient with observability tools (Prometheus, Grafana, OTEL, Cloudwatch) and telemetry pipelines. Solid programming skills in Python and/or Go More ❯
london (city of london), south east england, united kingdom Hybrid / WFH Options
Future Talent Group
SiteReliabilityEngineer – FinTech/Global Payments – London HQ/Remote First Salary - £80,000/£85,000 + Bonus Location - This UK-based team offers a fully remote working option, with a headquarters in Central London. In this role, you will be joining a leading SaaS … market. The business aims to scale its platform significantly over the next few years to support a growing international client base. Responsibilities Champion core SRE practices: define SLIs/SLOs/SLAs, reduce toil through automation, and plan for Disaster Recovery. Refine KPIs to support data-driven decisions around reliability … teams to build resilient, observable, and maintainable features. Promote DevOps culture by leading knowledge-sharing sessions and supporting issue resolution. Skills Strong grounding in SRE principles and operational best practices. Proficient with observability tools (Prometheus, Grafana, OTEL, Cloudwatch) and telemetry pipelines. Solid programming skills in Python and/or Go More ❯
Leigh, south east england, united kingdom Hybrid / WFH Options
Future Talent Group
SiteReliabilityEngineer – FinTech/Global Payments – London HQ/Remote First Salary - £80,000/£85,000 + Bonus Location - This UK-based team offers a fully remote working option, with a headquarters in Central London. In this role, you will be joining a leading SaaS … market. The business aims to scale its platform significantly over the next few years to support a growing international client base. Responsibilities Champion core SRE practices: define SLIs/SLOs/SLAs, reduce toil through automation, and plan for Disaster Recovery. Refine KPIs to support data-driven decisions around reliability … teams to build resilient, observable, and maintainable features. Promote DevOps culture by leading knowledge-sharing sessions and supporting issue resolution. Skills Strong grounding in SRE principles and operational best practices. Proficient with observability tools (Prometheus, Grafana, OTEL, Cloudwatch) and telemetry pipelines. Solid programming skills in Python and/or Go More ❯
london (west end), south east england, united kingdom Hybrid / WFH Options
Future Talent Group
SiteReliabilityEngineer – FinTech/Global Payments – London HQ/Remote First Salary - £80,000/£85,000 + Bonus Location - This UK-based team offers a fully remote working option, with a headquarters in Central London. In this role, you will be joining a leading SaaS … market. The business aims to scale its platform significantly over the next few years to support a growing international client base. Responsibilities Champion core SRE practices: define SLIs/SLOs/SLAs, reduce toil through automation, and plan for Disaster Recovery. Refine KPIs to support data-driven decisions around reliability … teams to build resilient, observable, and maintainable features. Promote DevOps culture by leading knowledge-sharing sessions and supporting issue resolution. Skills Strong grounding in SRE principles and operational best practices. Proficient with observability tools (Prometheus, Grafana, OTEL, Cloudwatch) and telemetry pipelines. Solid programming skills in Python and/or Go More ❯
Ashton-Under-Lyne, Greater Manchester, United Kingdom Hybrid / WFH Options
Future Talent Group
SiteReliabilityEngineer – FinTech/Global Payments – London HQ/Remote First Salary - £80,000/£85,000 + Bonus Location - This UK-based team offers a fully remote working option, with a headquarters in Central London. In this role, you will be joining a leading SaaS … market. The business aims to scale its platform significantly over the next few years to support a growing international client base. Responsibilities Champion core SRE practices: define SLIs/SLOs/SLAs, reduce toil through automation, and plan for Disaster Recovery. Refine KPIs to support data-driven decisions around reliability … teams to build resilient, observable, and maintainable features. Promote DevOps culture by leading knowledge-sharing sessions and supporting issue resolution. Skills Strong grounding in SRE principles and operational best practices. Proficient with observability tools (Prometheus, Grafana, OTEL, Cloudwatch) and telemetry pipelines. Solid programming skills in Python and/or Go More ❯
Job Title: SiteReliabilityEngineer - Python Location: Remote (1-2 days per month in the London office) Salary/Rate: Up to £711 Per day Inside IR35 Start Date: 08/05/2025 Job Type: Contract - Long term project Company Introduction … We have an exciting opportunity now available with one of our sector-leading huge social media clients! They are currently looking for a skilled SRE to join their team for a long term project. Job Responsibilities/Required experience Ability to code in Python - essential Linux Admin (System Administration & Network More ❯
Job Title: SiteReliability Engineering (SRE) Lead – Observability Location: Stratford, London (Hybrid – 2 days per week onsite) Contract Length: 6 months Rate: £450–£500 per day (Inside IR35) Industry: Financial Services A leading Financial Services organisation in London is seeking a SiteReliability Engineering (SRE) Lead … a hybrid role requiring two days per week onsite at their Stratford, London offices. The role sits Inside IR35 . Key Responsibilities: Lead the SRE Observability team and champion observability practices across multiple product groups. Provide thought leadership from the Cognizant delivery team on all things SRE. Leverage hands-on … creation and QA of project-level Observability Plans. Input into and assure the quality of testing strategies and results. Requirements Proven experience in an SRE role with a strong focus on Observability. Expert-level proficiency with DevOps tools including GitHub, GitHub Actions, Jenkins, Nexus, CloudFormation/Terraform, and CodeQL. Extensive More ❯
london, south east england, united kingdom Hybrid / WFH Options
MarkJames Search
Job Title: SiteReliability Engineering (SRE) Lead – Observability Location: Stratford, London (Hybrid – 2 days per week onsite) Contract Length: 6 months Rate: £450–£500 per day (Inside IR35) Industry: Financial Services A leading Financial Services organisation in London is seeking a SiteReliability Engineering (SRE) Lead … a hybrid role requiring two days per week onsite at their Stratford, London offices. The role sits Inside IR35 . Key Responsibilities: Lead the SRE Observability team and champion observability practices across multiple product groups. Provide thought leadership from the Cognizant delivery team on all things SRE. Leverage hands-on … creation and QA of project-level Observability Plans. Input into and assure the quality of testing strategies and results. Requirements Proven experience in an SRE role with a strong focus on Observability. Expert-level proficiency with DevOps tools including GitHub, GitHub Actions, Jenkins, Nexus, CloudFormation/Terraform, and CodeQL. Extensive More ❯
SiteReliabilityEngineer, Simple Storage and Glacier team (S3G) Managing trillions of objects in storage, retrieving them in sub-x ms, building software that deploys to tens of thousands of hosts, achieving 99.% (you didn't read that wrong, that's 11 nines!) durability. These are just … find every day working in Simple Storage Service (S3) and Glacier. The Region Services S3 and Glacier Engineering team are looking for a talented engineer who is motivated to solve complex challenges, yet are not constrained by "how things are usually done" and are willing to decompose and reinvent … there's nothing we can't achieve. Minimum Qualifications Knowledge of systems engineering fundamentals (networking, storage, operating systems) Experience designing or architecting (design patterns, reliability and scaling) of new and existing systems Experience in networking, storage systems, operating systems and hands-on systems engineering Experience programming with at least More ❯
Insight Global is looking for an Operations SiteReliabilityEngineer to help with global operational support for a leading infrastructure software product company’s customer-facing Saas products. You will be part of a … team of engineers that demonstrates superb technical competency, operates mission-critical infrastructure and ensures the highest levels of availability (24x7x365), performance and security. This SRE would be part of the critical operations function that is responsible for the monitoring, availability and performance of production services. They would be driving automation More ❯
Insight Global is looking for an Operations SiteReliabilityEngineer to help with global operational support for a leading infrastructure software product company’s customer-facing Saas products. You will be part of a … team of engineers that demonstrates superb technical competency, operates mission-critical infrastructure and ensures the highest levels of availability (24x7x365), performance and security. This SRE would be part of the critical operations function that is responsible for the monitoring, availability and performance of production services. They would be driving automation More ❯
Job Title: SiteReliabilityEngineer | Splunk | SIEM Location: London (once or twice a month in the office - travel expenses will be compensated) Salary/Rate: Up to £700 per day INSIDE IR35 Start Date: 21/04/2025 Job Type : Contract Company Introduction We have an More ❯
Role Join us as a SiteReliabilityEngineer and help us build the future of data sovereignty! We're seeking an SRE passionate about creating high-performance, scalable, and reliable services for our production infrastructure. You'll have a direct impact, improving existing systems and developing innovative … for self-hosted deployments, including infrastructure and tooling for monitoring, alerting, and troubleshooting. This will involve designing and implementing robust metrics and logging systems. Engineer the Acra platform for high availability and fault tolerance. This includes ensuring resilience against Cloud Availability Zone outages and the ability to gracefully handle … resource utilization. Collaborate closely with the product engineering team to influence the design and implementation of new products and features, ensuring they meet our reliability and scalability standards from the outset. Preferred Qualifications Bachelor's degree (or foreign equivalent) in Computer Science or a related field is desired; relevant More ❯
our platforms empower our partners to develop lifesaving drugs, forecast supply chain disruptions, locate missing children, and more. The Role We're looking for SiteReliability Engineers who can help us build, operate, and maintain high-performance, scalable, and reliable services for our production infrastructure, across both cloud … on-prem environments. SiteReliability Engineers combine engineering experience and an innate drive to improve existing systems and processes, with the creativity to develop novel solutions to evolving challenges. Our team strives to automate processes wherever possible, using whichever tools are best for the job. You'll be More ❯
our platforms empower our partners to develop lifesaving drugs, forecast supply chain disruptions, locate missing children, and more. The Role We're looking for SiteReliability Engineers who can help us build, operate, and maintain high-performance, scalable, and reliable services for our production infrastructure. SiteReliabilityMore ❯
cost-effectively than alternative approaches. The C3 AI Platform supports the value chain in any industry with prebuilt, configurable, high-value AI applications for reliability, fraud detection, sensor network health, supply network optimization, energy management, anti-money laundering, and customer engagement. Learn more at: C3 AI We are looking … for a SiteReliabilityEngineer to join our team in London. Responsibilities: Maximize system uptime and availability, ensuring functional and performance SLAs. Establish end-to-end monitoring and alerting on all critical aspects. Solve complex problems for critical services and build automation to prevent problem recurrence. Influence More ❯
aspect of your life - we want to help you create your ideal work/life blend, rather than squeezing in life around work. The SiteReliabilityEngineer is responsible for the technical support and operation of UK Platforms (both from an application and infrastructure perspective) by actively … and Infrastructure patching, DR testing, creation of alerting and monitoring and service transition activities - knowledge transfer, operation playbook updates/knowledge articles update. The SRE will closely collaborate with the customer support team and the product development squads in various global locations to achieve the best outcome for the technical … the technical support function, is the contact point for technical incidents as well as for the support teams. Key Accountabilities Ensure high availability and reliability of UK platforms with day-to-day support. Manage incidents with rapid resolution, root cause analysis, and post-mortems to prevent recurrence. Optimise monitoring More ❯
You will need to login before you can apply for a job. SiteReliabilityEngineer, Simple Storage and Glacier team (S3G) Sector: Engineering Role: Professional Contract Type: Permanent Hours: Full Time DESCRIPTION Managing trillions of objects in storage, retrieving them in sub-x ms, building software that … find every day working in Simple Storage Service (S3) and Glacier. The Region Services S3 and Glacier Engineering team are looking for a talented engineer who is motivated to solve complex challenges, yet are not constrained by "how things are usually done" and are willing to decompose and reinvent … standard of quality across all team deliverables. BASIC QUALIFICATIONS Knowledge of systems engineering fundamentals (networking, storage, operating systems) Experience designing or architecting (design patterns, reliability and scaling) of new and existing systems Experience in networking, storage systems, operating systems and hands-on systems engineering Experience programming with at least More ❯
the world's most innovative fintechs, and the Financial Times recognised us as one of Europe's fastest-growing companies in 2023. The Client SiteReliabilityEngineer role in Infrastructure, Client Services will be responsible for enabling and supporting our clients to deliver a best in class … at scale. This role supports clients in their cloud infrastructure preparation, deployment, optimisation and troubleshooting. Duties Hands on cloud infrastructure consulting both on client site and remote Working with customers and external partners to design and prepare suitable cloud infrastructure to ensure Thought Machine Vault products can be tested … empower holistic digital transformation in collaboration with Thought Machine Client Architects Supporting and troubleshooting client, SaaS and internal cloud infrastructure both remotely and on site, including by promoting and deploying suitable monitoring, logging and alerting tools Working closely with internal product and engineering teams to ensure client feedback is More ❯
do your best work. Learn more at . We are looking for experienced people who are competent in the cloud and knowledgeable about the SRE (sitereliability engineering) domain. The team The Core Architecture Team (CAT) produces and manages the core technology, methodologies, and frameworks that underpin all More ❯
which pronouns you use (For example: she/her, he/him, they/them, etc). At Bumble, SiteReliability Engineers (SRE) are responsible for ensuring the reliability, scalability and performance of software systems while bridging the gap between development, security and operations. We proactively manage … and fixing issues Respond to system outages, troubleshooting root causes and implementing preventative measures Collaborate with engineering teams and security engineers to improve system reliability, security and performance Participate in on-call rotations Create and maintain documentation to improve knowledge sharing across teams About you Excellent problem solving, analytical … must Proficiency in at least Python or Golang programming languages Experience with CI/CD pipelines Strong proficiency with Kubernetes architecture Prior experience in SRE, System administration or DevOps roles Strong proficiency with Linux/Unix operating systems, including hands-on experience in configuration and troubleshooting Proficiency with using Puppet More ❯
Shazam SiteReliability Engineers are not just responsible for making sure all services and systems that Shazam relies on are operating at their highest level; they're also responsible for helping development teams embrace these principles … as they develop software. Shazam SREs embed themselves with development teams and act as extensions of those teams to propagate best practices. As an SRE, you'll collaborate with development teams to help them understand the bigger picture of distributed systems, beyond individual components. We are strong believers in ownership … with software engineers being responsible for the code they write. The SRE team helps build the competencies across teams to ensure we build scalable and supportable systems. This role sits in our London office reporting to our Head of SRE. The successful candidate will be assisting multiple development teams based More ❯
SiteReliability Engineering Manager (SRE), Analytics The Apple Services Engineering team (ASE) is one of the most exciting examples of Apple's long-held passion for combining art and technology. These are the people who power the App Store, Apple TV, Apple Music, Apple Podcasts, and Apple Books. … ever before, these teams remain small and multi-functional, offering greater exposure to the array of opportunities here. Description The Service Reliability Engineering (SRE) Manager role in Apple Services Engineering requires a mix of strategic engineering and design along with hands-on technical work. This SRE will configure, tune … millions of users, then this is the place for you! Minimum Qualifications Experience with hiring and leading engineers Demonstrable success leading engineering teams - ideally SRE or Production Engineering Experience with large scale distributed systems Deep understanding and experience in one or more of the following: Hadoop, Spark, Flink, Kubernetes, AWS More ❯