Site Reliability Engineering Jobs in London

76 to 100 of 192 Site Reliability Engineering Jobs in London

Senior Site Reliability Engineer - (Networks, AWS & Kubernetes)

London, United Kingdom
Source Technology
Senior Site Reliability Engineer - (Networks, AWS & Kubernetes) (BH-48405-2) Location: London, England Sector: IT Salary: £90,000.00 to £120,000.00 per annum Benefits: + 15% bonus + car allowance A truly unique opportunity to help launch a brand new team within a global financial services provider. This … skilled Full Stack Infrastructure Engineers will cover Compute, Storage, Network, and Cloud technologies. You will help design, implement, and manage robust infrastructure solutions, ensuring reliability, scalability, and performance. Requirements: Proven experience managing and optimizing a diverse infrastructure stack. Extensive knowledge of cloud platforms (AWS, Azure, GCP) and infrastructure as … pipeline management and DevOps practices. Strong understanding of disaster recovery and business continuity planning. Experience with performance tuning and capacity planning. Understanding of chaos engineering principles and practices. Skills in cost optimization for cloud infrastructure. Specific Tools and Techniques: Experience in using cloud native monitoring tools like AWS CloudWatch More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Site Reliability Engineer - FinTech / Global Payments - London HQ / Remote First

Central London, UK
Hybrid / WFH Options
Future Talent Group
Site Reliability Engineer – FinTech/Global Payments – London HQ/Remote First Salary - £80,000/£85,000 + Bonus Location - This UK-based team offers a fully remote working option, with a headquarters in Central London. In this role, you will be joining a leading SaaS FinTech … market. The business aims to scale its platform significantly over the next few years to support a growing international client base. Responsibilities Champion core SRE practices: define SLIs/SLOs/SLAs, reduce toil through automation, and plan for Disaster Recovery. Refine KPIs to support data-driven decisions around reliability … teams to build resilient, observable, and maintainable features. Promote DevOps culture by leading knowledge-sharing sessions and supporting issue resolution. Skills Strong grounding in SRE principles and operational best practices. Proficient with observability tools (Prometheus, Grafana, OTEL, Cloudwatch) and telemetry pipelines. Solid programming skills in Python and/or Go More ❯
Posted:

Site Reliability Engineer - FinTech / Global Payments - London HQ / Remote First

West London, UK
Hybrid / WFH Options
Future Talent Group
Site Reliability Engineer – FinTech/Global Payments – London HQ/Remote First Salary - £80,000/£85,000 + Bonus Location - This UK-based team offers a fully remote working option, with a headquarters in Central London. In this role, you will be joining a leading SaaS FinTech … market. The business aims to scale its platform significantly over the next few years to support a growing international client base. Responsibilities Champion core SRE practices: define SLIs/SLOs/SLAs, reduce toil through automation, and plan for Disaster Recovery. Refine KPIs to support data-driven decisions around reliability … teams to build resilient, observable, and maintainable features. Promote DevOps culture by leading knowledge-sharing sessions and supporting issue resolution. Skills Strong grounding in SRE principles and operational best practices. Proficient with observability tools (Prometheus, Grafana, OTEL, Cloudwatch) and telemetry pipelines. Solid programming skills in Python and/or Go More ❯
Posted:

Site Reliability Engineer – FinTech / Global Payments – London HQ / Remote First

City of London, London, United Kingdom
Hybrid / WFH Options
Future Talent Group
Site Reliability Engineer – FinTech/Global Payments – London HQ/Remote First Salary - £80,000/£85,000 + Bonus Location - This UK-based team offers a fully remote working option, with a headquarters in Central London. In this role, you will be joining a leading SaaS FinTech … market. The business aims to scale its platform significantly over the next few years to support a growing international client base. Responsibilities Champion core SRE practices: define SLIs/SLOs/SLAs, reduce toil through automation, and plan for Disaster Recovery. Refine KPIs to support data-driven decisions around reliability … teams to build resilient, observable, and maintainable features. Promote DevOps culture by leading knowledge-sharing sessions and supporting issue resolution. Skills Strong grounding in SRE principles and operational best practices. Proficient with observability tools (Prometheus, Grafana, OTEL, Cloudwatch) and telemetry pipelines. Solid programming skills in Python and/or Go More ❯
Posted:

Site Reliability Engineer – FinTech / Global Payments – London HQ / Remote First

London Area, United Kingdom
Hybrid / WFH Options
Future Talent Group
Site Reliability Engineer – FinTech/Global Payments – London HQ/Remote First Salary - £80,000/£85,000 + Bonus Location - This UK-based team offers a fully remote working option, with a headquarters in Central London. In this role, you will be joining a leading SaaS FinTech … market. The business aims to scale its platform significantly over the next few years to support a growing international client base. Responsibilities Champion core SRE practices: define SLIs/SLOs/SLAs, reduce toil through automation, and plan for Disaster Recovery. Refine KPIs to support data-driven decisions around reliability … teams to build resilient, observable, and maintainable features. Promote DevOps culture by leading knowledge-sharing sessions and supporting issue resolution. Skills Strong grounding in SRE principles and operational best practices. Proficient with observability tools (Prometheus, Grafana, OTEL, Cloudwatch) and telemetry pipelines. Solid programming skills in Python and/or Go More ❯
Posted:

Site Reliability Engineer – FinTech / Global Payments – London HQ / Remote First

East London, London, United Kingdom
Hybrid / WFH Options
Future Talent Group
Site Reliability Engineer – FinTech/Global Payments – London HQ/Remote First Salary - £80,000/£85,000 + Bonus Location - This UK-based team offers a fully remote working option, with a headquarters in Central London. In this role, you will be joining a leading SaaS FinTech … market. The business aims to scale its platform significantly over the next few years to support a growing international client base. Responsibilities Champion core SRE practices: define SLIs/SLOs/SLAs, reduce toil through automation, and plan for Disaster Recovery. Refine KPIs to support data-driven decisions around reliability … teams to build resilient, observable, and maintainable features. Promote DevOps culture by leading knowledge-sharing sessions and supporting issue resolution. Skills Strong grounding in SRE principles and operational best practices. Proficient with observability tools (Prometheus, Grafana, OTEL, Cloudwatch) and telemetry pipelines. Solid programming skills in Python and/or Go More ❯
Posted:

Site Reliability Engineer – FinTech / Global Payments – London HQ / Remote First

Central London / West End, London, United Kingdom
Hybrid / WFH Options
Future Talent Group
Site Reliability Engineer – FinTech/Global Payments – London HQ/Remote First Salary - £80,000/£85,000 + Bonus Location - This UK-based team offers a fully remote working option, with a headquarters in Central London. In this role, you will be joining a leading SaaS FinTech … market. The business aims to scale its platform significantly over the next few years to support a growing international client base. Responsibilities Champion core SRE practices: define SLIs/SLOs/SLAs, reduce toil through automation, and plan for Disaster Recovery. Refine KPIs to support data-driven decisions around reliability … teams to build resilient, observable, and maintainable features. Promote DevOps culture by leading knowledge-sharing sessions and supporting issue resolution. Skills Strong grounding in SRE principles and operational best practices. Proficient with observability tools (Prometheus, Grafana, OTEL, Cloudwatch) and telemetry pipelines. Solid programming skills in Python and/or Go More ❯
Posted:

Site Reliability Engineer – FinTech / Global Payments – London HQ / Remote First

london, south east england, united kingdom
Hybrid / WFH Options
Future Talent Group
Site Reliability Engineer – FinTech/Global Payments – London HQ/Remote First Salary - £80,000/£85,000 + Bonus Location - This UK-based team offers a fully remote working option, with a headquarters in Central London. In this role, you will be joining a leading SaaS FinTech … market. The business aims to scale its platform significantly over the next few years to support a growing international client base. Responsibilities Champion core SRE practices: define SLIs/SLOs/SLAs, reduce toil through automation, and plan for Disaster Recovery. Refine KPIs to support data-driven decisions around reliability … teams to build resilient, observable, and maintainable features. Promote DevOps culture by leading knowledge-sharing sessions and supporting issue resolution. Skills Strong grounding in SRE principles and operational best practices. Proficient with observability tools (Prometheus, Grafana, OTEL, Cloudwatch) and telemetry pipelines. Solid programming skills in Python and/or Go More ❯
Posted:

Site Reliability Engineer – FinTech / Global Payments – London HQ / Remote First

london (city of london), south east england, united kingdom
Hybrid / WFH Options
Future Talent Group
Site Reliability Engineer – FinTech/Global Payments – London HQ/Remote First Salary - £80,000/£85,000 + Bonus Location - This UK-based team offers a fully remote working option, with a headquarters in Central London. In this role, you will be joining a leading SaaS FinTech … market. The business aims to scale its platform significantly over the next few years to support a growing international client base. Responsibilities Champion core SRE practices: define SLIs/SLOs/SLAs, reduce toil through automation, and plan for Disaster Recovery. Refine KPIs to support data-driven decisions around reliability … teams to build resilient, observable, and maintainable features. Promote DevOps culture by leading knowledge-sharing sessions and supporting issue resolution. Skills Strong grounding in SRE principles and operational best practices. Proficient with observability tools (Prometheus, Grafana, OTEL, Cloudwatch) and telemetry pipelines. Solid programming skills in Python and/or Go More ❯
Posted:

Site Reliability Engineer – FinTech / Global Payments – London HQ / Remote First

london (west end), south east england, united kingdom
Hybrid / WFH Options
Future Talent Group
Site Reliability Engineer – FinTech/Global Payments – London HQ/Remote First Salary - £80,000/£85,000 + Bonus Location - This UK-based team offers a fully remote working option, with a headquarters in Central London. In this role, you will be joining a leading SaaS FinTech … market. The business aims to scale its platform significantly over the next few years to support a growing international client base. Responsibilities Champion core SRE practices: define SLIs/SLOs/SLAs, reduce toil through automation, and plan for Disaster Recovery. Refine KPIs to support data-driven decisions around reliability … teams to build resilient, observable, and maintainable features. Promote DevOps culture by leading knowledge-sharing sessions and supporting issue resolution. Skills Strong grounding in SRE principles and operational best practices. Proficient with observability tools (Prometheus, Grafana, OTEL, Cloudwatch) and telemetry pipelines. Solid programming skills in Python and/or Go More ❯
Posted:

Site Reliability Engineer, ESC Managed Operations

London, United Kingdom
Amazon
Site Reliability Engineer, ESC Managed Operations Job ID: Amazon Development Centre Ireland Limited - D94 AWS is set to introduce the inaugural European Sovereign Cloud (ESC), marking a significant development in utility computing (UC). To spearhead this initiative, we are actively seeking experienced systems development engineers with a … typical day in this role involves collaborating with technology leaders, contributing to the enhancement of day-to-day operations, and ensuring improvements in availability, reliability, latency, performance, and efficiency of the ESC. You will be required to occasionally participate in "on-call" rotations to resolve incidents occurring out-of … an experienced professional ready for a challenging and impactful opportunity, we invite you to join our efforts in building a best-in-class development engineering and operations team that aligns with AWS' commitment to customer satisfaction and continual innovation. Utility Computing (UC) European Sovereign Cloud (ESC) is a part More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Site Reliability Engineer, Simple Storage and Glacier team ( S3G)

London, United Kingdom
Amazon
Site Reliability Engineer, Simple Storage and Glacier team (S3G) Managing trillions of objects in storage, retrieving them in sub-x ms, building software that deploys to tens of thousands of hosts, achieving 99.% (you didn't read that wrong, that's 11 nines!) durability. These are just a … scale of the exciting problems you will find every day working in Simple Storage Service (S3) and Glacier. The Region Services S3 and Glacier Engineering team are looking for a talented engineer who is motivated to solve complex challenges, yet are not constrained by "how things are usually done … services in AWS, including support for customers who require specialized security solutions for their cloud services. Key job responsibilities Be actively involved in daily engineering activities, providing hands-on technical guidance and support. Define architecture, design, and proof-of-concept efforts for end-to-end project delivery, ensuring high More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Site Reliability Engineer

London, United Kingdom
Hybrid / WFH Options
GSS UK Services Limited
as well as ensuring the platform is performant and reliable. You will be a key member of the team, liaising with product teams, embedding SRE principles and building the observability platform for the next stage of growth at GSS. You will have direct input into the direction of Technical Operations … culture where your ideas are valued. What You'll Do Key responsibilities in this role will include (but not be limited to): Leveraging core SRE values - measuring (SLI/SLO/SLA), testing, and eliminating toil via automation with appropriate Disaster Recovery planning Refining KPIs to enable data-driven decision … preferably event-driven) Be a self-starter that relishes responsibility. Take strategic direction and own end to end delivery of solutions. Expert knowledge of SRE fundamentals and a commitment to best practice Fluency with common observability tooling like Prometheus, Grafana, OTEL and Cloudwatch Experience analysing and building data telemetry, querying More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Technical Lead

London Area, United Kingdom
Nando's UK & IRE
System innovation, Ingredients Management, New Product Development, and Purchase Order Management. Key responsibilities include integrating external supplier APIs, implementing Software Reliability Engineering (SRE) best practices, and ensuring seamless collaboration across teams. The team enhances resilience, observability, incident management, and disaster recovery (DR) practices while working closely with Peri … used to enhance system performance, maintainability, and security. Observability & Resilience : Establish best practices for monitoring, incident response, and disaster recovery. Best Practices & Governance : Define engineering standards and drive their adoption across teams. Vendor & API Management : Oversee integrations with third-party suppliers and ensure seamless API interactions. Technical Roadmap : Work … closely with Program Manager, Head of Product and Head of Engineering to define and implement a strategic roadmap for stock systems. Team Mentorship : Support engineers in developing their technical skills. Incident Management : Ensure effective post-mortem reviews and embed reliability best practices into development processes. Skills & Experience Proven More ❯
Posted:

Technical Lead

london, south east england, united kingdom
Nando's UK & IRE
System innovation, Ingredients Management, New Product Development, and Purchase Order Management. Key responsibilities include integrating external supplier APIs, implementing Software Reliability Engineering (SRE) best practices, and ensuring seamless collaboration across teams. The team enhances resilience, observability, incident management, and disaster recovery (DR) practices while working closely with Peri … used to enhance system performance, maintainability, and security. Observability & Resilience : Establish best practices for monitoring, incident response, and disaster recovery. Best Practices & Governance : Define engineering standards and drive their adoption across teams. Vendor & API Management : Oversee integrations with third-party suppliers and ensure seamless API interactions. Technical Roadmap : Work … closely with Program Manager, Head of Product and Head of Engineering to define and implement a strategic roadmap for stock systems. Team Mentorship : Support engineers in developing their technical skills. Incident Management : Ensure effective post-mortem reviews and embed reliability best practices into development processes. Skills & Experience Proven More ❯
Posted:

Site Reliability Engineer - US

London, United Kingdom
Hybrid / WFH Options
Valarian Technologies Limited
The Role Join us as a Site Reliability Engineer and help us build the future of data sovereignty! We're seeking an SRE passionate about creating high-performance, scalable, and reliable services for our production infrastructure. You'll have a direct impact, improving existing systems and developing innovative … solutions to complex challenges. Our small, collaborative engineering teams own the full lifecycle of their services, from development to production operations. We champion automation and empower you to choose the best tools for the job. If you thrive in a fast-paced environment where you can make a real … bases (25+ users) and sustained daily usage. This will involve performance tuning, capacity planning, and optimization of resource utilization. Collaborate closely with the product engineering team to influence the design and implementation of new products and features, ensuring they meet our reliability and scalability standards from the outset. More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Site Reliability Engineer - Python

City, London, United Kingdom
Square One Resources
Job Title: Site Reliability Engineer - Python Location: Remote (1-2 days per month in the London office) Salary/Rate: Up to £711 Per day Inside IR35 Start Date: 08/05/2025 Job Type: Contract - Long term project Company Introduction … We have an exciting opportunity now available with one of our sector-leading huge social media clients! They are currently looking for a skilled SRE to join their team for a long term project. Job Responsibilities/Required experience Ability to code in Python - essential Linux Admin (System Administration & Network More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Cloud and Platforms Architect

London Area, United Kingdom
OpticoreIT
spanning multiple sectors including Finance, Broadcast Media, Telecommunications and more. You’ll support the Head of Architecture, Enterprise Architects, Cloud Product Managers and platform engineering teams in creating the vision and strategy for multi-cloud adoption, supporting streaming platforms, applications and data. What you'll be doing: Consulting with …/knowledge of providing high quality architecture within an Agile delivery environment Knowledge and/or experience with DevOps practices. DevOps methodologies, DevOps tools, Site Reliability Engineering, Platform Engineering. Knowledge of security best practices for cloud platforms, cloud hosted applications and operating systems More ❯
Posted:

Client Site Reliability Engineer

London, United Kingdom
Hybrid / WFH Options
Molten Ventures plc
the world's most innovative fintechs, and the Financial Times recognised us as one of Europe's fastest-growing companies in 2023. The Client Site Reliability Engineer role in Infrastructure, Client Services will be responsible for enabling and supporting our clients to deliver a best in class cloud … at scale. This role supports clients in their cloud infrastructure preparation, deployment, optimisation and troubleshooting. Duties Hands on cloud infrastructure consulting both on client site and remote Working with customers and external partners to design and prepare suitable cloud infrastructure to ensure Thought Machine Vault products can be tested … empower holistic digital transformation in collaboration with Thought Machine Client Architects Supporting and troubleshooting client, SaaS and internal cloud infrastructure both remotely and on site, including by promoting and deploying suitable monitoring, logging and alerting tools Working closely with internal product and engineering teams to ensure client feedback More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Site Reliability Engineer

London, United Kingdom
C3 AI
cost-effectively than alternative approaches. The C3 AI Platform supports the value chain in any industry with prebuilt, configurable, high-value AI applications for reliability, fraud detection, sensor network health, supply network optimization, energy management, anti-money laundering, and customer engagement. Learn more at: C3 AI We are looking … for a Site Reliability Engineer to join our team in London. Responsibilities: Maximize system uptime and availability, ensuring functional and performance SLAs. Establish end-to-end monitoring and alerting on all critical aspects. Solve complex problems for critical services and build automation to prevent problem recurrence. Influence and … to streamline system updates and upgrades. Set up critical infrastructure, tools, and framework to streamline the deployment cycle. Work cross-functionally with Services and Engineering teams. Qualifications: Demonstrated experience in deploying, managing, and operating scalable and fault-tolerant Linux/Kubernetes/JVM-based infrastructure in AWS, GCP, and More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Principal Cloud Engineer (AWS)

London, United Kingdom
Cloudscaler Limited
pathways and achieve your career goals. About the role Cloudscaler is seeking a Principal Cloud Engineer who will be a thought leader within our engineering practice with extensive knowledge of platform and site reliability engineering best practices. You will be guiding our customers through complex cloud … or CloudFormation) Ability to advise on infrastructure as code structure and modularisation An understanding of the challenges and priorities of operating services, and how SRE principles can be applied Implementing and working with continuous integration/continuous delivery pipelines Experience with system design, and the ability to assess trade-offs … Talent Acquisition team 1st Interview - 30 minute remote interview with our hiring team 2nd Interview - 60 minute remote technical interview with members of our engineering team 3rd Interview - 60 minute in-person interview with members of our Senior Leadership Team Cloudscaler is proud to be an equal opportunity employer More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Senior Software Engineer

london, south east england, united kingdom
Nando's UK & IRE
provide support and take strategic steps to improve stock operations. Key responsibilities will include integrating external supplier APIs, implementing Software Reliability Engineering (SRE) best practices, and closely collaborating with existing teams to develop new software solutions. The team will enhance resilience, observability, incident management, and disaster recovery (DR … Including strong SQL. Experience designing and troubleshooting large-scale distributed systems Experience in Big Data, preferably BigQuery (GCP) Familiarity with agile methodologies and best engineering practices. Strong problem-solving skills, ownership mindset, and ability to work cross-functionally. Understanding of stock systems and their impact on Finance and ABP More ❯
Posted:

Senior Software Engineer

London Area, United Kingdom
Nando's UK & IRE
provide support and take strategic steps to improve stock operations. Key responsibilities will include integrating external supplier APIs, implementing Software Reliability Engineering (SRE) best practices, and closely collaborating with existing teams to develop new software solutions. The team will enhance resilience, observability, incident management, and disaster recovery (DR … Including strong SQL. Experience designing and troubleshooting large-scale distributed systems Experience in Big Data, preferably BigQuery (GCP) Familiarity with agile methodologies and best engineering practices. Strong problem-solving skills, ownership mindset, and ability to work cross-functionally. Understanding of stock systems and their impact on Finance and ABP More ❯
Posted:

Senior AI Engineer

London, United Kingdom
Hybrid / WFH Options
Citigroup Inc
productivity based improvements to the suite of tools and processes used by our large user base of developers using latest Gen AI tooling & prompt engineering This dedicated team is focused on driving the everything-as-code agenda and delivering tangible reductions in process friction, errors, and manual effort. In … this role you will be responsible for driving and contributing to the technical direction of our products and services, instilling engineering best practices into the team, and promoting cultural change across the organisation. Responsibilities: Understand the landscape, tooling and procedures used by developers at Citi and look for opportunities … reduce toil and aid simplification using Gen AI based solutions. Apply classic AI and novel Gen AI evaluation methodology to raise the quality and reliability bar for the software that you will deliver, as well to manage and mitigate risks that specific/inherent to this field. Advice on More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Technical Account Manager-ISV, ES - APJC - ANZ

London, United Kingdom
Amazon
of the following technical domains: Compute, Storage, Networking, CDN, Databases, DevOps, Big Data and Analytics, Security, Applications Development. You preferably have software engineering, SRE and/or external customer-facing experience with the ability to clearly articulate and present to small and large audiences. Experience in similar roles such … at least two of the following technical domains: Compute, Storage, Networking, CDN, Databases, DevOps, Big Data and Analytics, Security, Applications Development. - Software Engineering, SRE and/or external customer-facing experience with the ability to clearly articulate and present to small and large audiences. - 5+ years of experience in More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:
Site Reliability Engineering
London
10th Percentile
£70,625
25th Percentile
£86,563
Median
£110,000
75th Percentile
£138,750
90th Percentile
£139,375