Permanent Site Reliability Engineer Jobs in London

17 of 17 Permanent Site Reliability Engineer Jobs in London

Junior Site Reliability Engineer

London, South East, England, United Kingdom
Understanding Recruitment
Junior Site Reliability Engineer … We are currently working with a leading Financial Services company, who are looking for a Junior Site Reliability Engineer to join their ever-expanding platform/SRE team from their Shoreditch, London, Office where you will be expected to travel to the office 4 days a week. They are looking for you to have excellent cloud knowledge … ideally AWS as well as having experience of Powershell/Python. As the Junior Site Reliability Engineer, you will be a self-starter who has excellent stakeholder management experience who can show outcome based work. You will ideally have 2 years of commercial experience coming from an IT Operations/Cloud infrastructure background. Please note this is More ❯
Employment Type: Full-Time
Salary: £40,000 - £45,000 per annum, Inc benefits
Posted:

Site Reliability Engineer

London, United Kingdom
Hybrid / WFH Options
Delta Capita
Role Overview: We are seeking a highly skilled and motivated Site Reliability Engineer (SRE) to join our engineering team to support critical application deployments in a "follow-the-sun" environment. In this role, you will leverage your expertise in cloud provisioning, infrastructure as code, and container orchestration to ensure the reliability, scalability, and performance of our … and versioning. Containerization and Orchestration: Deploy, manage, and provide ongoing support for containerized applications using Kubernetes, including Amazon EKS (Elastic Kubernetes Service) and Azure Kubernetes Service (AKS), ensuring their reliability, availability, and performance. Monitoring and Alerting: Monitor application performance and system health through observability tools (e.g., Prometheus, Grafana, ELK stack), proactively identifying and resolving issues to ensure high availability … and solutions, including RESTful APIs, ensuring seamless integration across platforms. Post-Mortem Analysis: Conduct comprehensive post-mortem analyses following incidents, identifying root causes and recommending improvements to enhance system reliability and performance. Mentorship: Mentor and guide junior engineers, fostering a culture of knowledge sharing and continuous improvement within the engineering team. Skills and Experience: Bachelor's degree in computer More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Site Reliability Engineer (Golang)

London, United Kingdom
LinuxRecruit
Are you a seasoned Site reliability Engineer looking for an exciting new challenge? Join this team and transition into maintaining and enhancing the reliability of one of the world's largest platforms. In this role, you will utilise your expertise in Golang coding to develop robust applications, ensuring the systems remain resilient, scalable, and efficient. If … presence and commitment to innovation, you will have the opportunity to work on projects that reach millions of users, making a real difference in the tech world. As a Site Reliability Engineer, you will be responsible for designing, developing, and maintaining systems and applications using Golang. You will monitor and optimise system performance with tools such as … Grafana, Prometheus, New Relic, and Splunk. Your role will involve identifying and resolving reliability issues, automating processes, and ensuring the seamless operation of the platform. If you have a passion for technology and a drive to ensure excellence, we would love to hear from you More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Senior Site Reliability Engineer

London, United Kingdom
Board Intelligence
We unleash the potential of organisations through the science of board effectiveness, building better businesses and benefiting society. The Opportunity As a Senior Site Reliability Engineer (SRE), you'll be joining a team whose mission is to ensure the availability, performance, security and reliability of our platform and core services, ensuring that they meet the needs … be responsible for visibility and monitoring of those systems, for building tooling and automation to reduce TOIL and for responding to incidents as part of our 24/7 SRE on-call team. The SRE team: Strives to provide the highest standards of Availability, Scalability, Performance and Security for our Software as a Service environments across multiple cloud vendors and … work Proactively monitors our platform and responds to incidents as part of a 24/7 rota Key responsibilities of the role We're looking for a great Senior SRE to be a hands on individual contributor to key technical projects and to help us build a first-class SRE function. This role will involve: Hands on work with technical More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Site Reliability Engineer

London, United Kingdom
Alokknight
Job Description Would you like to be an Engineer that builds the Cloud, rather than just uses it? At AWS, our Engineers manage the behind-the-scenes software and tools that support the world's largest cloud computing infrastructure. We … offer an exciting opportunity to join a world-class network team in a dynamic environment that feels like a start-up. As a Site Reliability Engineer (SRE) , you will deploy, manage, troubleshoot, and innovate the tools, services, and components that enable our network engineers to automate and maintain network operations. Your internal customers are your network engineering More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Platform Engineer/SRE

London, United Kingdom
Hybrid / WFH Options
Ascendion
Job Title: Platform Engineer/SRE Work Location: Bromley/Chester, UK (Hybrid 3 days in a week) Job Description: We are seeking a Platform Engineer/SRE with a strong and diverse technical background. The ideal candidate will possess hands-on development experience along with Site Reliability Engineering (SRE) expertise. This role requires a proactive … platform stability issues, and develop resilient and reliable systems. Key Responsibilities: Provide hands-on technical leadership in platform engineering initiatives. Ensure platform stability and resilience by identifying and resolving reliability … issues. Collaborate with cross-functional teams to deliver scalable and robust system solutions. Key Skills Required: Strong development experience in Java (primary skill). Site Reliability Engineering ( SRE ) experience. Proficiency with Kafka , Mule , and Oracle Database . Ability to work at a managerial level while remaining hands-on with technical tasks. Nice to Have: Knowledge of payment systems More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Platform Engineer/SRE

Bromley, Greater London, Bromley Town, United Kingdom
Hybrid / WFH Options
Ascendion
Job Title: Platform Engineer/SRE Work Location: Bromley/Chester, UK (Hybrid – 3 days in a week) Job Description: We are seeking a Platform Engineer/SRE with a strong and diverse technical background. The ideal candidate will possess hands-on development experience along with Site Reliability Engineering (SRE) expertise. This role requires a proactive … platform stability issues, and develop resilient and reliable systems. Key Responsibilities: Provide hands-on technical leadership in platform engineering initiatives. Ensure platform stability and resilience by identifying and resolving reliability … issues. Collaborate with cross-functional teams to deliver scalable and robust system solutions. Key Skills Required: Strong development experience in Java (primary skill). Site Reliability Engineering ( SRE ) experience. Proficiency with Kafka , Mule , and Oracle Database . Ability to work at a managerial level while remaining hands-on with technical tasks. Nice to Have: Knowledge of payment systems More ❯
Employment Type: Permanent
Posted:

Site Reliability Engineer, Region Services

London, United Kingdom
Amazon
Overview Site Reliability Engineer, Region Services Job ID: AWS EMEA SARL (UK Branch) Would you like to help implement innovative cloud computing solutions and solve the most complex technical problems? Are you excited by the prospect of building and running the world's largest cloud computing infrastructure to provide a better world for future generations? AWS builds … you'll be part of a world-class team in a dynamic environment that has the entrepreneurial feel of a start-up. This is an opportunity to operate and engineer systems on a massive scale, and to gain world class experience in cloud computing. You'll be surrounded by people who are passionate about cloud computing, believe that first … Build and operate distributed systems Design and build the tools and utilities that are part of the AWS fleet running our internal services Key job responsibilities The Systems Development engineer will be a key member of a new team pioneering automated build and deployment of Windows based services. The team is adopting a code-first and hands off CI More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Software Engineering Manager, Site Reliability, Cloud Incident Response

London, United Kingdom
Google Inc
response. Preferred qualifications: Master's degree or PhD in Computer Science, or a related technical field. Experience as a cloud customer. About the job Site Reliability Engineering (SRE) combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. SRE ensures that Google Cloud's services-both our internally critical and our externally … visible systems-have reliability, uptime appropriate to customer's needs and a fast rate of improvement. Additionally SRE's will keep an ever-watchful eye on our systems capacity and performance. Much of our software development focuses on optimizing existing systems, building infrastructure and eliminating work through automation. On the SRE team, you'll have the opportunity to manage … the complex challenges of scale which are unique to Google Cloud, while using your expertise in coding, algorithms, complexity analysis and large-scale system design. SRE's culture of intellectual curiosity, problem solving and openness is key to its success. Our organization brings together people with a wide variety of backgrounds, experiences and perspectives. We encourage them to collaborate, think More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Site Reliability Engineer

London, United Kingdom
Duffel
developer experience to go with it. The tools used on the team include Elixir, Phoenix, Kubernetes and Google Cloud Platform. Site Reliability Engineering at Duffel As an SRE at Duffel, you'll be part of a small team within engineering that is responsible for the reliability, performance, and resilience of our infrastructure and applications. You will be … silently drop spans. - An enthusiasm for both software development and systems engineering. - A high bar for code and configuration quality and readability. - A good understanding of current observability and reliability practices. - Experienced and comfortable in running incident response. - Big picture thinking - you can make trade offs on technical work streams against business impact. - Fantastic communication skills. You're able … We manage a data pipeline using Pub/Sub, Airbyte, and dbt. Our Current Focus We're currently driving a big shift in how we think about and monitor reliability across the engineering organisation, with a focus on early detection of customer-impacting issues. We're extending and standardising our use of OpenTelemetry, and introducing Honeycomb as the single More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Software Engineer (SRE)

London, United Kingdom
LinuxRecruit
Are you a passionate Software Engineer looking for an exciting new challenge? Join this team and transition into maintaining and enhancing the reliability of one of the world's largest platforms. In this role, you will utilise your expertise in Golang coding to develop robust applications, ensuring the systems remain resilient, scalable, and efficient. If you thrive in … presence and commitment to innovation, you will have the opportunity to work on projects that reach millions of users, making a real difference in the tech world. As a Site Reliability Engineer, you will be responsible for designing, developing, and maintaining systems and applications using Golang. You will monitor and optimise system performance with tools such as … Grafana, Prometheus, New Relic, and Splunk. Your role will involve identifying and resolving reliability issues, automating processes, and ensuring the seamless operation of the platform. If you have a passion for technology and a drive to ensure excellence, we would love to hear from you More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Site Reliability Engineer in London - LinuxRecruit

London, United Kingdom
Golang Works
Experience with C++, Python, or Golang (optional) About the Company The company itself provides a suite of products and services to help people improve Staff-Level Full-Stack Software Engineer Software Engineer, Onchain (Infrastructure) 108 E 16th Street, New York, NY 10003 Subscribe to our newsletter Join over 111,000 others and get access to exclusive content, job More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Lead Site Reliability Engineer

City of London, London, United Kingdom
TechNET IT Recruitment Ltd
with a modern, full-stack platform that delivers logs, metrics, traces, and security monitoring — cutting costs by up to 70% while boosting efficiency. They are looking for a Lead SRE to own and elevate our Alerting & Incident Management platform . You’ll be the driving force behind reliability, customer satisfaction, and product excellence — ensuring smooth alert management, fewer engineering … experience by speeding up alert resolution and reducing interruptions for engineers. Build solutions to common pain points, shaping roadmaps, documentation, and technical knowledge. Develop benchmarking tools to improve performance, reliability, and scalability. Stay ahead of incident management trends to drive new workflows and product improvements. Mentor teams and lead with clear, impactful communication. What We’re Looking For 5+ … platform experience (PagerDuty, OpsGenie, etc. a plus). Solid technical foundation with cloud/distributed systems. Excellent communicator, comfortable working across US/IL time zones. Bonus: leadership experience, SRE/DevOps background, knowledge of SLO/SLA practices. More ❯
Posted:

Lead Site Reliability Engineer

London Area, United Kingdom
TechNET IT Recruitment Ltd
with a modern, full-stack platform that delivers logs, metrics, traces, and security monitoring — cutting costs by up to 70% while boosting efficiency. They are looking for a Lead SRE to own and elevate our Alerting & Incident Management platform . You’ll be the driving force behind reliability, customer satisfaction, and product excellence — ensuring smooth alert management, fewer engineering … experience by speeding up alert resolution and reducing interruptions for engineers. Build solutions to common pain points, shaping roadmaps, documentation, and technical knowledge. Develop benchmarking tools to improve performance, reliability, and scalability. Stay ahead of incident management trends to drive new workflows and product improvements. Mentor teams and lead with clear, impactful communication. What We’re Looking For 5+ … platform experience (PagerDuty, OpsGenie, etc. a plus). Solid technical foundation with cloud/distributed systems. Excellent communicator, comfortable working across US/IL time zones. Bonus: leadership experience, SRE/DevOps background, knowledge of SLO/SLA practices. More ❯
Posted:

Site Reliability Engineer

London, United Kingdom
Hybrid / WFH Options
Gizmo
recently raised $22M in Series A funding to accelerate our vision of helping 1 billion people learn. Role Overview Reporting to the CTO, you will own capacity, performance and reliability for … Gizmo's full-stack platform as daily traffic climbs from hundreds of thousands to millions of users. You'll write code across the stack, but your charter is classic SRE: defend SLOs , eliminate toil , and raise the ceiling on scale before it becomes a hard limit. Key Responsibilities Define SLIs/SLOs for latency, availability and error rate; codify error … on Kubernetes and CI/CD; keep "toil" Coach full-stack engineers on query optimisation, schema design and back-pressure techniques; document patterns and anti-patterns by creating an SRE playbook Hands-on scale experience : you have run relational stores at 100 k+ TPS or 1 M+ concurrent users (e.g., multi-tenant PostgreSQL, sharded MySQL). Strong backend fundamentals around More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Site Reliability Engineer with Python (IT) in London - Nexus Jobs Limited

London, United Kingdom
WorksHub
Job Requirements Designing, building, and operating large-scale production systems Deep knowledge of Python is preferred, though other languages like Java, Go, Rust, or similar will also be heavily considered Experience using source control (Git, GitHub) and feature branching strategies More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Platform Engineer/SRE

Bromley, Greater London, Bromley Town, United Kingdom
Hybrid / WFH Options
Ascendion
Below are the details of the position: Job Title: Platform Engineer/SRE Work Location: Bromley, UK (Hybrid – 3 days a week) Job Description: 15+ years’ experience in delivering large scale applications with focus on performance, scalability, security, and reliability. Experience in a highly Agile continuous integration and continuous deployment environment, preferably within a financial domain. Strong experience in More ❯
Employment Type: Permanent, Contract
Posted:
Site Reliability Engineer
London
10th Percentile
£60,250
25th Percentile
£80,000
Median
£85,000
75th Percentile
£95,000
90th Percentile
£101,250