london, south east england, united kingdom Hybrid / WFH Options
MarkJames Search
Job Title: SiteReliabilityEngineering (SRE) Lead – Observability Location: Stratford, London (Hybrid – 2 days per week onsite) Contract Length: 6 months Rate: £450–£500 per day (Inside IR35) Industry: Financial Services A leading Financial Services organisation in London is seeking a SiteReliabilityEngineering (SRE) Lead – Observability to join their team on a 6-month contract. This is a hybrid role requiring two days per week onsite at their Stratford, London offices. The role sits Inside IR35 . Key Responsibilities: Lead the SRE Observability team and champion observability practices across multiple product groups. … creation and QA of project-level Observability Plans. Input into and assure the quality of testing strategies and results. Requirements Proven experience in an SRE role with a strong focus on Observability. Expert-level proficiency with DevOps tools including GitHub, GitHub Actions, Jenkins, Nexus, CloudFormation/Terraform, and CodeQL. Extensive More ❯
The SRE Manager is responsible for leading the SiteReliabilityEngineering function across Europe, ensuring the reliability, scalability, and performance of critical infrastructure and services. This role plays a key part in the global follow-the-sun support model, working closely with the Global SRE Leader … team. You'll collaborate with Engineering, Infrastructure, and Operations teams to maintain high availability and resilient service delivery, while also mentoring a regional SRE team focused on continuous improvement and innovation. Key Responsibilities: Technical Leadership Develop deep expertise in the Titanium trading platform to lead and support critical business … ensuring priorities align with business goals and resource capacity. Operational Excellence Champion initiatives that enhance system availability, scalability, and performance. Collaborate with the Global SRE Leader to refine and enforce operational policies (e.g., Capacity Planning, Change Management, Disaster Recovery). Cross-Functional Collaboration Partner with Software Engineering, Infrastructure, Operations More ❯
Lead SRE - FinTech - £125K Our client is one of the world’s leading FinTech companies and are building out their SiteReliabilityEngineering (SRE) team in the UK. They’re looking to hire an experienced SRE to lead the team, grow it and drive engineering forwards. … The role will still be very “hands on” with a focus on the improvement, automation and engineering of their trading platforms – a cutting edge, low latency and very high availability infrastructure. You’ll take real ownership of release pipelines, performance engineering and work how you want a huge … able to define and drive technology strategies - You’ll have the chance to make the team your own! Requirements: Very strong technical experience of SiteReliability/DevOps Engineering Good experience of Java and/or Python development and scripting Very good experience of containers, monitoring, automation More ❯
SRE Lead - FinTech - £120K+ Our client is one of the world’s leading FinTechs and are building out their SiteReliabilityEngineering (SRE) team in the UK. They’re looking to hire an experienced SRE to lead the team, grow it and drive platform engineering forwards. … The role will still be very “hands on” with a focus on the improvement, automation and engineering of their trading platforms – a cutting edge, low latency and very high availability infrastructure. You’ll take real ownership of release pipelines, performance engineering and work how you want a huge … able to define and drive technology strategies - You’ll have the chance to make the team your own! Requirements: Very strong technical experience of SiteReliability/DevOps Engineering Good experience of Java and/or Python development and scripting Very good experience of containers, monitoring, automation More ❯
facilitate effective job matching and career development, not just for our users but also for our own team members. We are looking for a SiteReliability Engineer Lead to ensure our systems are reliable, scalable, and efficient. As the SiteReliability Engineer Lead, you will take … maintaining the health and performance of our platforms while also leading a talented team of engineers. You will champion and coach best practices in reliability and operational excellence to deliver an exceptional experience for our users. Key Responsibilities Minimising downtime to products & services and ensuring the platform is stable … availability and performance. Work with senior stakeholders to mature the concept of SiteReliability within the CVL organisation. Lead and mentor the SRE function, fostering a culture of collaboration, innovation, and excellence. Creating a bridge between Development and support teams by applying an ‘as-a-service' mindset to More ❯
Reading, England, United Kingdom Hybrid / WFH Options
People Source Consulting trading as Experis
SiteReliability Engineer - DevOps Engineer 18 Month Contract PAYE - Fully Remote/or Hybrid based in Midlands if preferred. The role We are working with one of the finest gaming studios in the industry and are on the lookout for … an exceptional SiteReliability Engineer who can bring their expertise and unique thinking to help make their team even stronger! As an SRE the main purpose is solving for scale through collaboration and automation, bringing engineering principles to infrastructure and operational problems. Work closely with the different More ❯
london, south east england, united kingdom Hybrid / WFH Options
RP International
SiteReliability Engineer | Inside IR35 | Hybrid - 2 Days Onsite London | 6 Month Contract Our client a multinational and respected consultancy is hiring for a Lead SiteReliability Engineer with expertise in AWS and DevOps Tools for a new project in the Public Sector. Technical Skills/ More ❯
london, south east england, united kingdom Hybrid / WFH Options
Nationwide Building Society
Senior Application Engineer London or Swindon Office Hybrid role - x2 days on site/x3 Work from home Nationwide is leveraging the power of Cloud, DevOps and Agile to bring teams together and create compelling Digital experiences for members of today and tomorrow. At the same time, we’re … solutions Knowledge of Financial services and the design and delivery of Conversational Banking solutions Knowledge or interest in SiteReliabilityEngineering (SRE) principles Our customer first behaviours put customers and members at the heart of how we work together. They are the set of behaviours that every More ❯
A prestigious, technology-driven hedge fund is seeking a highly skilled SiteReliability Engineer (SRE) to join their global infrastructure team. This is a unique opportunity to work in a high-performance, low-latency trading environment where technology is at the heart of the firm’s competitive edge. … critical role in ensuring the performance, reliability, and scalability of the systems that power the fund’s trading and research platforms. As an SRE, you will work closely with software engineers and investment teams to build automation-first solutions that support the firm’s most advanced strategies. Key Responsibilities … across the business. Design and implement automation to eliminate manual tasks and reduce operational risk. Collaborate with software and investment teams to embed the SRE mindset early in the development lifecycle. Ideal Candidate: SRE with experience working with data systems Ability to program (structured, OOP, and TDD) using one or More ❯
london, south east england, united kingdom Hybrid / WFH Options
Tenth Revolution Group
within creative or product-led software organisations (SMEs preferred). Expert level knowledge of AWS, IaC, Pipelines, and Containers. 10+ years of experience in sitereliabilityengineering, systems engineering, or a related field, with a focus on cloud migration and modernization. Deep hands-on knowledge of … the target architecture and migration roadmap. Hybrid Architecture Design: Create scalable, secure hybrid cloud solutions that integrate on-premise infrastructure with cloud services. Ensure Reliability: Maintain and enhance the reliability and availability of key systems throughout the transformation. Champion Automation & IaC: Promote Infrastructure as Code (e.g., Terraform, CloudFormation More ❯
eDV SiteReliability Engineer Looking for an eDV SRE. Someone with a defence industry specialism with a passion … for creating efficient and secure cloud infrastructure. You will play a critical part in transforming and enhancing both internal and external operations through effective SRE practices. Core Responsibilities Infrastructure Excellence: Design, manage, and evolve our cloud-based infrastructure to support high-traffic applications and seamless service delivery. Secure Deployment: Develop More ❯
london, south east england, united kingdom Hybrid / WFH Options
Future Talent Group
SiteReliability Engineer – FinTech/Global Payments – London HQ/Remote First Salary - £80,000/£85,000 + Bonus Location - This UK-based team offers a fully remote working option, with a headquarters in Central London. In this role, you will be joining a leading SaaS FinTech … market. The business aims to scale its platform significantly over the next few years to support a growing international client base. Responsibilities Champion core SRE practices: define SLIs/SLOs/SLAs, reduce toil through automation, and plan for Disaster Recovery. Refine KPIs to support data-driven decisions around reliability … teams to build resilient, observable, and maintainable features. Promote DevOps culture by leading knowledge-sharing sessions and supporting issue resolution. Skills Strong grounding in SRE principles and operational best practices. Proficient with observability tools (Prometheus, Grafana, OTEL, Cloudwatch) and telemetry pipelines. Solid programming skills in Python and/or Go More ❯
Leigh, south east england, united kingdom Hybrid / WFH Options
Future Talent Group
SiteReliability Engineer – FinTech/Global Payments – London HQ/Remote First Salary - £80,000/£85,000 + Bonus Location - This UK-based team offers a fully remote working option, with a headquarters in Central London. In this role, you will be joining a leading SaaS FinTech … market. The business aims to scale its platform significantly over the next few years to support a growing international client base. Responsibilities Champion core SRE practices: define SLIs/SLOs/SLAs, reduce toil through automation, and plan for Disaster Recovery. Refine KPIs to support data-driven decisions around reliability … teams to build resilient, observable, and maintainable features. Promote DevOps culture by leading knowledge-sharing sessions and supporting issue resolution. Skills Strong grounding in SRE principles and operational best practices. Proficient with observability tools (Prometheus, Grafana, OTEL, Cloudwatch) and telemetry pipelines. Solid programming skills in Python and/or Go More ❯
london (city of london), south east england, united kingdom Hybrid / WFH Options
Future Talent Group
SiteReliability Engineer – FinTech/Global Payments – London HQ/Remote First Salary - £80,000/£85,000 + Bonus Location - This UK-based team offers a fully remote working option, with a headquarters in Central London. In this role, you will be joining a leading SaaS FinTech … market. The business aims to scale its platform significantly over the next few years to support a growing international client base. Responsibilities Champion core SRE practices: define SLIs/SLOs/SLAs, reduce toil through automation, and plan for Disaster Recovery. Refine KPIs to support data-driven decisions around reliability … teams to build resilient, observable, and maintainable features. Promote DevOps culture by leading knowledge-sharing sessions and supporting issue resolution. Skills Strong grounding in SRE principles and operational best practices. Proficient with observability tools (Prometheus, Grafana, OTEL, Cloudwatch) and telemetry pipelines. Solid programming skills in Python and/or Go More ❯
london (west end), south east england, united kingdom Hybrid / WFH Options
Future Talent Group
SiteReliability Engineer – FinTech/Global Payments – London HQ/Remote First Salary - £80,000/£85,000 + Bonus Location - This UK-based team offers a fully remote working option, with a headquarters in Central London. In this role, you will be joining a leading SaaS FinTech … market. The business aims to scale its platform significantly over the next few years to support a growing international client base. Responsibilities Champion core SRE practices: define SLIs/SLOs/SLAs, reduce toil through automation, and plan for Disaster Recovery. Refine KPIs to support data-driven decisions around reliability … teams to build resilient, observable, and maintainable features. Promote DevOps culture by leading knowledge-sharing sessions and supporting issue resolution. Skills Strong grounding in SRE principles and operational best practices. Proficient with observability tools (Prometheus, Grafana, OTEL, Cloudwatch) and telemetry pipelines. Solid programming skills in Python and/or Go More ❯
System innovation, Ingredients Management, New Product Development, and Purchase Order Management. Key responsibilities include integrating external supplier APIs, implementing Software ReliabilityEngineering (SRE) best practices, and ensuring seamless collaboration across teams. The team enhances resilience, observability, incident management, and disaster recovery (DR) practices while working closely with Peri … used to enhance system performance, maintainability, and security. Observability & Resilience : Establish best practices for monitoring, incident response, and disaster recovery. Best Practices & Governance : Define engineering standards and drive their adoption across teams. Vendor & API Management : Oversee integrations with third-party suppliers and ensure seamless API interactions. Technical Roadmap : Work … closely with Program Manager, Head of Product and Head of Engineering to define and implement a strategic roadmap for stock systems. Team Mentorship : Support engineers in developing their technical skills. Incident Management : Ensure effective post-mortem reviews and embed reliability best practices into development processes. Skills & Experience Proven More ❯
Reigate, Surrey, United Kingdom Hybrid / WFH Options
Willis Towers Watson
a track record in Microsoft Azure and Observability platforms in complex SaaS environments and have excellent communication skills. You will be joining our growing engineering organization building a wide range of market-leading InsurTech solutions at an exciting time as we evolve our portfolio from desktop/on-premise … towards cloud/SaaS. As a DevOps Engineer, you will work together with product and engineering teams and deliver highly scalable and reliable infrastructure, pipelines and support tools. This is a critical and varied role, using a wide range of technologies, combining strategic work with short-term tactical fixes … open to flexible and hybrid working arrangements, with presence in the Reigate office two days per week. The Role: Collaborate with the product and engineering teams on the design, build and operational management of the client-facing services Champion and implement best practice solutions for reliable, performant and observable More ❯
provide support and take strategic steps to improve stock operations. Key responsibilities will include integrating external supplier APIs, implementing Software ReliabilityEngineering (SRE) best practices, and closely collaborating with existing teams to develop new software solutions. The team will enhance resilience, observability, incident management, and disaster recovery (DR … Including strong SQL. Experience designing and troubleshooting large-scale distributed systems Experience in Big Data, preferably BigQuery (GCP) Familiarity with agile methodologies and best engineering practices. Strong problem-solving skills, ownership mindset, and ability to work cross-functionally. Understanding of stock systems and their impact on Finance and ABP More ❯
This role plays a key part in the global follow-the-sun support model, working closely with the Global SRE Leader to support platforms worldwide. We are looking for SRE talent with experience in an On-Prem/Datacenter environment. The ideal candidate will bring strong technical leadership, experience in … impact team. You'll collaborate with Engineering, Infrastructure, and Operations teams to maintain high availability and resilient service delivery, while also mentoring a SRE team focused on continuous improvement and innovation. Key Responsibilities: Technical Leadership Develop deep expertise in the Titanium trading platform to lead and support critical business … ensuring priorities align with business goals and resource capacity. Operational Excellence Champion initiatives that enhance system availability, scalability, and performance. Collaborate with the Global SRE Leader to refine and enforce operational policies (e.g., Capacity Planning, Change Management, Disaster Recovery). Cross-Functional Collaboration Partner with Software Engineering, Infrastructure, Operations More ❯
Southampton, Hampshire, United Kingdom Hybrid / WFH Options
NICE
production environment by monitoring availability and taking a holistic view of system health Build software and systems to manage platform infrastructure and applications Improve reliability, quality, and time-to-market of our suite of software solutions Measure and optimize system performance, with an eye toward pushing our capabilities forward … getting ahead of customer needs, and innovating to continually improve Provide primary operational support and engineering for multiple large distributed software applications How will you make an impact? Gather and analyze metrics from both operating systems and applications to assist in performance tuning and fault finding Partner with development … Participate in system design consulting, platform management, and capacity planning Create sustainable systems and services through automation and uplifts Balance feature development speed and reliability with well-defined service level objectives Have you got what it takes? 3-6 years of working experience in a similar role, with a More ❯
Software engineering is at the heart of what we do here at giffgaff. Our agile engineering teams build and support a wide variety of applications and services. These combine to create our unique user experience on the giffgaff website, enable a whole range of awesome features via modern … We want you to share your opinions on how we are doing things - and help us get better! The Role We're looking for SRE engineers with passion and energy, a strong desire to learn and improve and a commitment to testing and excellence. You'll have to make tough … logging and tracing (Prometheus, EFK, Alertmanager, Jaeger/Zipkin) Troubleshooting in complex environments using the monitoring tools Establishing and measuring SLIs and SLOs with engineering teams Participate in periodic 24x7 on-call duties Build and manage systems, infrastructure and applications through automation (Terraform, Ansible) CI/CD tools: Jenkins More ❯
Pipeline team to contribute Modules and Incremental improvements to the Core Pipeline, Core Services, and Core Operating team's Libraries and Services. Collaborate with SRE team members to ensure development and operations work is delivered in full and on time (agile/product sprints). Write and maintain systems/ More ❯
This is a Vice President position within Platform ReliabilityEngineering and Management leveraging SRE Principles and Practices based out of London. This role is looking for a multi skilled professional with strong technical leadership, people management skills to deliver critical services ensuring a highly stable, reliable, and resilient … to eliminate manual day to day support activities; scope and create automation for deployment, management and visibility of our services. Extensive experience with implementing SRE principles in the organization such as SLOs/SLIs and TOIL measurement Implement best practices for building successful monitoring and alerting systems. Experience with Observability … platforms like Datadog and open telemetry is desired. You will work closely with engineering/development teams to design, build, and maintain systems and help them decide on products to use, schema design and query tuning. Extensive troubleshooting abilities across the stack QUALIFICATIONS Required Skills: Bachelor's degree or More ❯
you get to do in this role: Provide relief and sustainable resolution to issues within our infrastructure. Use your experience in software development, systems engineering and networking to proactively prevent repeatable issues. Drive initiatives with partner teams to improve the reliability and performance of the infrastructure through improved … insights, or exploring AI's potential impact on the function or industry. 0-2 years' experience. Relevant bachelor's degree in computer science, software engineering or a related field. Knowledge of Linux systems. Coding experience, we normally prefer Python or JavaScript. Networking skills, IP addressing, routing protocols. Monitoring of More ❯
Bexhill-on-sea, Sussex, United Kingdom Hybrid / WFH Options
Hastings Direct
team is always looking for passionate individuals who are eager to make a difference and contribute to our success. Job Details As a Senior SiteReliability … engineer for Monitoring and Observability, you will be part of the Technology Engineering team within CIO supporting the definition, maintenance and implementation of SRE strategies and principles around Monitoring and Observability for Hastings. Support the definition, maintenance and implementation of SRE strategies including observability and event management. Design, build More ❯