london (west end), south east england, united kingdom Hybrid / WFH Options
Future Talent Group
SiteReliability Engineer – FinTech/Global Payments – London HQ/Remote First Salary - £80,000/£85,000 + Bonus Location - This UK-based team offers a fully remote working option, with a headquarters in Central London. In this role, you will be joining a leading SaaS FinTech … market. The business aims to scale its platform significantly over the next few years to support a growing international client base. Responsibilities Champion core SRE practices: define SLIs/SLOs/SLAs, reduce toil through automation, and plan for Disaster Recovery. Refine KPIs to support data-driven decisions around reliability … teams to build resilient, observable, and maintainable features. Promote DevOps culture by leading knowledge-sharing sessions and supporting issue resolution. Skills Strong grounding in SRE principles and operational best practices. Proficient with observability tools (Prometheus, Grafana, OTEL, Cloudwatch) and telemetry pipelines. Solid programming skills in Python and/or Go More ❯
Ashton-Under-Lyne, Greater Manchester, United Kingdom Hybrid / WFH Options
Future Talent Group
SiteReliability Engineer – FinTech/Global Payments – London HQ/Remote First Salary - £80,000/£85,000 + Bonus Location - This UK-based team offers a fully remote working option, with a headquarters in Central London. In this role, you will be joining a leading SaaS FinTech … market. The business aims to scale its platform significantly over the next few years to support a growing international client base. Responsibilities Champion core SRE practices: define SLIs/SLOs/SLAs, reduce toil through automation, and plan for Disaster Recovery. Refine KPIs to support data-driven decisions around reliability … teams to build resilient, observable, and maintainable features. Promote DevOps culture by leading knowledge-sharing sessions and supporting issue resolution. Skills Strong grounding in SRE principles and operational best practices. Proficient with observability tools (Prometheus, Grafana, OTEL, Cloudwatch) and telemetry pipelines. Solid programming skills in Python and/or Go More ❯
as well as ensuring the platform is performant and reliable. You will be a key member of the team, liaising with product teams, embedding SRE principles and building the observability platform for the next stage of growth at GSS. You will have direct input into the direction of Technical Operations … culture where your ideas are valued. What You'll Do Key responsibilities in this role will include (but not be limited to): Leveraging core SRE values - measuring (SLI/SLO/SLA), testing, and eliminating toil via automation with appropriate Disaster Recovery planning Refining KPIs to enable data-driven decision … preferably event-driven) Be a self-starter that relishes responsibility. Take strategic direction and own end to end delivery of solutions. Expert knowledge of SRE fundamentals and a commitment to best practice Fluency with common observability tooling like Prometheus, Grafana, OTEL and Cloudwatch Experience analysing and building data telemetry, querying More ❯
SiteReliability Engineer, Simple Storage and Glacier team (S3G) Managing trillions of objects in storage, retrieving them in sub-x ms, building software that deploys to tens of thousands of hosts, achieving 99.% (you didn't read that wrong, that's 11 nines!) durability. These are just a … scale of the exciting problems you will find every day working in Simple Storage Service (S3) and Glacier. The Region Services S3 and Glacier Engineering team are looking for a talented engineer who is motivated to solve complex challenges, yet are not constrained by "how things are usually done … services in AWS, including support for customers who require specialized security solutions for their cloud services. Key job responsibilities Be actively involved in daily engineering activities, providing hands-on technical guidance and support. Define architecture, design, and proof-of-concept efforts for end-to-end project delivery, ensuring high More ❯
System innovation, Ingredients Management, New Product Development, and Purchase Order Management. Key responsibilities include integrating external supplier APIs, implementing Software ReliabilityEngineering (SRE) best practices, and ensuring seamless collaboration across teams. The team enhances resilience, observability, incident management, and disaster recovery (DR) practices while working closely with Peri … used to enhance system performance, maintainability, and security. Observability & Resilience : Establish best practices for monitoring, incident response, and disaster recovery. Best Practices & Governance : Define engineering standards and drive their adoption across teams. Vendor & API Management : Oversee integrations with third-party suppliers and ensure seamless API interactions. Technical Roadmap : Work … closely with Program Manager, Head of Product and Head of Engineering to define and implement a strategic roadmap for stock systems. Team Mentorship : Support engineers in developing their technical skills. Incident Management : Ensure effective post-mortem reviews and embed reliability best practices into development processes. Skills & Experience Proven More ❯
System innovation, Ingredients Management, New Product Development, and Purchase Order Management. Key responsibilities include integrating external supplier APIs, implementing Software ReliabilityEngineering (SRE) best practices, and ensuring seamless collaboration across teams. The team enhances resilience, observability, incident management, and disaster recovery (DR) practices while working closely with Peri … used to enhance system performance, maintainability, and security. Observability & Resilience : Establish best practices for monitoring, incident response, and disaster recovery. Best Practices & Governance : Define engineering standards and drive their adoption across teams. Vendor & API Management : Oversee integrations with third-party suppliers and ensure seamless API interactions. Technical Roadmap : Work … closely with Program Manager, Head of Product and Head of Engineering to define and implement a strategic roadmap for stock systems. Team Mentorship : Support engineers in developing their technical skills. Incident Management : Ensure effective post-mortem reviews and embed reliability best practices into development processes. Skills & Experience Proven More ❯
The Role Join us as a SiteReliability Engineer and help us build the future of data sovereignty! We're seeking an SRE passionate about creating high-performance, scalable, and reliable services for our production infrastructure. You'll have a direct impact, improving existing systems and developing innovative … solutions to complex challenges. Our small, collaborative engineering teams own the full lifecycle of their services, from development to production operations. We champion automation and empower you to choose the best tools for the job. If you thrive in a fast-paced environment where you can make a real … bases (25+ users) and sustained daily usage. This will involve performance tuning, capacity planning, and optimization of resource utilization. Collaborate closely with the product engineering team to influence the design and implementation of new products and features, ensuring they meet our reliability and scalability standards from the outset. More ❯
Job Title: SiteReliability Engineer - Python Location: Remote (1-2 days per month in the London office) Salary/Rate: Up to £711 Per day Inside IR35 Start Date: 08/05/2025 Job Type: Contract - Long term project Company Introduction … We have an exciting opportunity now available with one of our sector-leading huge social media clients! They are currently looking for a skilled SRE to join their team for a long term project. Job Responsibilities/Required experience Ability to code in Python - essential Linux Admin (System Administration & Network More ❯
the world's most innovative fintechs, and the Financial Times recognised us as one of Europe's fastest-growing companies in 2023. The Client SiteReliability Engineer role in Infrastructure, Client Services will be responsible for enabling and supporting our clients to deliver a best in class cloud … at scale. This role supports clients in their cloud infrastructure preparation, deployment, optimisation and troubleshooting. Duties Hands on cloud infrastructure consulting both on client site and remote Working with customers and external partners to design and prepare suitable cloud infrastructure to ensure Thought Machine Vault products can be tested … empower holistic digital transformation in collaboration with Thought Machine Client Architects Supporting and troubleshooting client, SaaS and internal cloud infrastructure both remotely and on site, including by promoting and deploying suitable monitoring, logging and alerting tools Working closely with internal product and engineering teams to ensure client feedback More ❯
Reigate, Surrey, United Kingdom Hybrid / WFH Options
Willis Towers Watson
a track record in Microsoft Azure and Observability platforms in complex SaaS environments and have excellent communication skills. You will be joining our growing engineering organization building a wide range of market-leading InsurTech solutions at an exciting time as we evolve our portfolio from desktop/on-premise … towards cloud/SaaS. As a DevOps Engineer, you will work together with product and engineering teams and deliver highly scalable and reliable infrastructure, pipelines and support tools. This is a critical and varied role, using a wide range of technologies, combining strategic work with short-term tactical fixes … open to flexible and hybrid working arrangements, with presence in the Reigate office two days per week. The Role: Collaborate with the product and engineering teams on the design, build and operational management of the client-facing services Champion and implement best practice solutions for reliable, performant and observable More ❯
cost-effectively than alternative approaches. The C3 AI Platform supports the value chain in any industry with prebuilt, configurable, high-value AI applications for reliability, fraud detection, sensor network health, supply network optimization, energy management, anti-money laundering, and customer engagement. Learn more at: C3 AI We are looking … for a SiteReliability Engineer to join our team in London. Responsibilities: Maximize system uptime and availability, ensuring functional and performance SLAs. Establish end-to-end monitoring and alerting on all critical aspects. Solve complex problems for critical services and build automation to prevent problem recurrence. Influence and … to streamline system updates and upgrades. Set up critical infrastructure, tools, and framework to streamline the deployment cycle. Work cross-functionally with Services and Engineering teams. Qualifications: Demonstrated experience in deploying, managing, and operating scalable and fault-tolerant Linux/Kubernetes/JVM-based infrastructure in AWS, GCP, and More ❯
Insight Global is looking for an Operations SiteReliability Engineer to help with global operational support for a leading infrastructure software product company’s customer-facing Saas products. You will be part of a … team of engineers that demonstrates superb technical competency, operates mission-critical infrastructure and ensures the highest levels of availability (24x7x365), performance and security. This SRE would be part of the critical operations function that is responsible for the monitoring, availability and performance of production services. They would be driving automation … opportunity to join an organization expanding dramatically, whilst also offering a highly competitive salary, bonus and equity package. Must haves: A degree in Systems Engineering, Computer Science or related fields Professional experience working in a large cloud operations setting Experience administering Linux systems Strong hands-on experience of variants More ❯
pathways and achieve your career goals. About the role Cloudscaler is seeking a Principal Cloud Engineer who will be a thought leader within our engineering practice with extensive knowledge of platform and sitereliabilityengineering best practices. You will be guiding our customers through complex cloud … or CloudFormation) Ability to advise on infrastructure as code structure and modularisation An understanding of the challenges and priorities of operating services, and how SRE principles can be applied Implementing and working with continuous integration/continuous delivery pipelines Experience with system design, and the ability to assess trade-offs … Talent Acquisition team 1st Interview - 30 minute remote interview with our hiring team 2nd Interview - 60 minute remote technical interview with members of our engineering team 3rd Interview - 60 minute in-person interview with members of our Senior Leadership Team Cloudscaler is proud to be an equal opportunity employer More ❯
Job Title: SiteReliability Engineer | Splunk | SIEM Location: London (once or twice a month in the office - travel expenses will be compensated) Salary/Rate: Up to £700 per day INSIDE IR35 Start Date: 21/04/2025 Job Type : Contract Company Introduction We have an exciting More ❯
provide support and take strategic steps to improve stock operations. Key responsibilities will include integrating external supplier APIs, implementing Software ReliabilityEngineering (SRE) best practices, and closely collaborating with existing teams to develop new software solutions. The team will enhance resilience, observability, incident management, and disaster recovery (DR … Including strong SQL. Experience designing and troubleshooting large-scale distributed systems Experience in Big Data, preferably BigQuery (GCP) Familiarity with agile methodologies and best engineering practices. Strong problem-solving skills, ownership mindset, and ability to work cross-functionally. Understanding of stock systems and their impact on Finance and ABP More ❯
provide support and take strategic steps to improve stock operations. Key responsibilities will include integrating external supplier APIs, implementing Software ReliabilityEngineering (SRE) best practices, and closely collaborating with existing teams to develop new software solutions. The team will enhance resilience, observability, incident management, and disaster recovery (DR … Including strong SQL. Experience designing and troubleshooting large-scale distributed systems Experience in Big Data, preferably BigQuery (GCP) Familiarity with agile methodologies and best engineering practices. Strong problem-solving skills, ownership mindset, and ability to work cross-functionally. Understanding of stock systems and their impact on Finance and ABP More ❯
productivity based improvements to the suite of tools and processes used by our large user base of developers using latest Gen AI tooling & prompt engineering This dedicated team is focused on driving the everything-as-code agenda and delivering tangible reductions in process friction, errors, and manual effort. In … this role you will be responsible for driving and contributing to the technical direction of our products and services, instilling engineering best practices into the team, and promoting cultural change across the organisation. Responsibilities: Understand the landscape, tooling and procedures used by developers at Citi and look for opportunities … reduce toil and aid simplification using Gen AI based solutions. Apply classic AI and novel Gen AI evaluation methodology to raise the quality and reliability bar for the software that you will deliver, as well to manage and mitigate risks that specific/inherent to this field. Advice on More ❯
of the following technical domains: Compute, Storage, Networking, CDN, Databases, DevOps, Big Data and Analytics, Security, Applications Development. You preferably have software engineering, SRE and/or external customer-facing experience with the ability to clearly articulate and present to small and large audiences. Experience in similar roles such … at least two of the following technical domains: Compute, Storage, Networking, CDN, Databases, DevOps, Big Data and Analytics, Security, Applications Development. - Software Engineering, SRE and/or external customer-facing experience with the ability to clearly articulate and present to small and large audiences. - 5+ years of experience in More ❯
Sheffield, South Yorkshire, Yorkshire, United Kingdom Hybrid / WFH Options
DWP Digital
secure solutions across projects and initiatives which are transforming how government works. You'll use your knowledge of areas such as software development and sitereliabilityengineering to help keep DWP safe and compliant and you'll translate this to our brilliant project teams. The scale of More ❯
Manchester, North West, United Kingdom Hybrid / WFH Options
DWP Digital
secure solutions across projects and initiatives which are transforming how government works. You'll use your knowledge of areas such as software development and sitereliabilityengineering to help keep DWP safe and compliant and you'll translate this to our brilliant project teams. The scale of More ❯
Leeds, West Yorkshire, Yorkshire, United Kingdom Hybrid / WFH Options
DWP Digital
secure solutions across projects and initiatives which are transforming how government works. You'll use your knowledge of areas such as software development and sitereliabilityengineering to help keep DWP safe and compliant and you'll translate this to our brilliant project teams. The scale of More ❯
Birmingham, West Midlands, United Kingdom Hybrid / WFH Options
DWP Digital
secure solutions across projects and initiatives which are transforming how government works. You'll use your knowledge of areas such as software development and sitereliabilityengineering to help keep DWP safe and compliant and you'll translate this to our brilliant project teams. The scale of More ❯
Blackpool, Lancashire, North West, United Kingdom Hybrid / WFH Options
DWP Digital
secure solutions across projects and initiatives which are transforming how government works. You'll use your knowledge of areas such as software development and sitereliabilityengineering to help keep DWP safe and compliant and you'll translate this to our brilliant project teams. The scale of More ❯
Newcastle Upon Tyne, Tyne and Wear, North East, United Kingdom Hybrid / WFH Options
DWP Digital
secure solutions across projects and initiatives which are transforming how government works. You'll use your knowledge of areas such as software development and sitereliabilityengineering to help keep DWP safe and compliant and you'll translate this to our brilliant project teams. The scale of More ❯
preston, lancashire, north west england, united kingdom Hybrid / WFH Options
DWP Digital
secure solutions across projects and initiatives which are transforming how government works. You'll use your knowledge of areas such as software development and sitereliabilityengineering to help keep DWP safe and compliant and you'll translate this to our brilliant project teams. The scale of More ❯