Permanent Site Reliability Engineering Jobs in London

1 to 25 of 80 Permanent Site Reliability Engineering Jobs in London

Software Engineering Manager, Site Reliability, Cloud Incident Response

London, United Kingdom
Google Inc
Preferred qualifications: Master's degree or PhD in Computer Science, or a related technical field. Experience as a cloud customer. About the job Site Reliability Engineering (SRE) combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. SRE ensures that Google Cloud's services-both our internally critical and our … externally-visible systems-have reliability, uptime appropriate to customer's needs and a fast rate of improvement. Additionally SRE's will keep an ever-watchful eye on our systems capacity and performance. Much of our software development focuses on optimizing existing systems, building infrastructure and eliminating work through automation. On the SRE team, you'll have the opportunity to … manage the complex challenges of scale which are unique to Google Cloud, while using your expertise in coding, algorithms, complexity analysis and large-scale system design. SRE's culture of intellectual curiosity, problem solving and openness is key to its success. Our organization brings together people with a wide variety of backgrounds, experiences and perspectives. We encourage them to collaborate More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Platform Engineer/SRE

London, United Kingdom
Hybrid / WFH Options
Ascendion
Job Title: Platform Engineer/SRE Work Location: Bromley/Chester, UK (Hybrid 3 days in a week) Job Description: We are seeking a Platform Engineer/SRE with a strong and diverse technical background. The ideal candidate will possess hands-on development experience along with Site Reliability Engineering (SRE) expertise. This role requires a proactive individual … who can lead by example, address platform stability issues, and develop resilient and reliable systems. Key Responsibilities: Provide hands-on technical leadership in platform engineering initiatives. Ensure platform stability and resilience by identifying and resolving reliability issues. … Collaborate with cross-functional teams to deliver scalable and robust system solutions. Key Skills Required: Strong development experience in Java (primary skill). Site Reliability Engineering ( SRE ) experience. Proficiency with Kafka , Mule , and Oracle Database . Ability to work at a managerial level while remaining hands-on with technical tasks. Nice to Have: Knowledge of payment systems More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Platform Engineer/SRE

Bromley, Greater London, Bromley Town, United Kingdom
Hybrid / WFH Options
Ascendion
Job Title: Platform Engineer/SRE Work Location: Bromley/Chester, UK (Hybrid – 3 days in a week) Job Description: We are seeking a Platform Engineer/SRE with a strong and diverse technical background. The ideal candidate will possess hands-on development experience along with Site Reliability Engineering (SRE) expertise. This role requires a proactive individual … who can lead by example, address platform stability issues, and develop resilient and reliable systems. Key Responsibilities: Provide hands-on technical leadership in platform engineering initiatives. Ensure platform stability and resilience by identifying and resolving reliability issues. … Collaborate with cross-functional teams to deliver scalable and robust system solutions. Key Skills Required: Strong development experience in Java (primary skill). Site Reliability Engineering ( SRE ) experience. Proficiency with Kafka , Mule , and Oracle Database . Ability to work at a managerial level while remaining hands-on with technical tasks. Nice to Have: Knowledge of payment systems More ❯
Employment Type: Permanent
Posted:

Junior Site Reliability Engineer

London, South East, England, United Kingdom
Understanding Recruitment
Junior Site Reliability … Engineer We are currently working with a leading Financial Services company, who are looking for a Junior Site Reliability Engineer to join their ever-expanding platform/SRE team from their Shoreditch, London, Office where you will be expected to travel to the office 4 days a week. They are looking for you to have excellent cloud knowledge … ideally AWS as well as having experience of Powershell/Python. As the Junior Site Reliability Engineer, you will be a self-starter who has excellent stakeholder management experience who can show outcome based work. You will ideally have 2 years of commercial experience coming from an IT Operations/Cloud infrastructure background. Please note this is an More ❯
Employment Type: Full-Time
Salary: £40,000 - £45,000 per annum, Inc benefits
Posted:

Site Reliability Engineer

London, United Kingdom
Hybrid / WFH Options
Delta Capita
Role Overview: We are seeking a highly skilled and motivated Site Reliability Engineer (SRE) to join our engineering team to support critical application deployments in a "follow-the-sun" environment. In this role, you will leverage your expertise in cloud provisioning, infrastructure as code, and container orchestration to ensure the reliability, scalability, and performance of our … and versioning. Containerization and Orchestration: Deploy, manage, and provide ongoing support for containerized applications using Kubernetes, including Amazon EKS (Elastic Kubernetes Service) and Azure Kubernetes Service (AKS), ensuring their reliability, availability, and performance. Monitoring and Alerting: Monitor application performance and system health through observability tools (e.g., Prometheus, Grafana, ELK stack), proactively identifying and resolving issues to ensure high availability … tasks and manage configurations. Load Balancing: Implement and maintain load balancing solutions to ensure optimal distribution of application traffic and high availability. Collaboration with Development Teams: Collaborate with software engineering teams to design, develop, and maintain robust systems and solutions, including RESTful APIs, ensuring seamless integration across platforms. Post-Mortem Analysis: Conduct comprehensive post-mortem analyses following incidents, identifying More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Senior DevOps Platform Engineer

London, United Kingdom
CDW LLC
CDW. JOB TITLE: Senior Automation Engineer II DEPARTMENT: DevOps Engineer ROLE PURPOSE: This role is to design, build, and scale enterprise cloud platforms with a strong focus on automation, reliability, and developer experience. As part of the Cloud Infrastructure & DevOps team, you will build multi-cloud infrastructure that powers hundreds of production services, including critical Salesforce DevOps pipelines. You … environments. Drive infrastructure compliance, DevSecOps, and policy-as-code practices. KNOWLEDGE, SKILLS AND EXPERIENCE: Minimum 5 years of experience in Platform Engineering, Site Reliability Engineering (SRE), or DevOps roles supporting cloud-native enterprise environments Proficient in Microsoft Azure and AWS platforms with hands-on experience in Kubernetes (AKS/EKS), Helm charts, and service mesh technologies … or HashiCorp Terraform Associate are advantageous Strong interpersonal skills including clear communication, collaboration across teams, adaptability in fast-paced environments, and a proactive mindset with a focus on reliability, performance, and developer enablement We make technology work so people can do great things. CDW is a leading multi-brand provider of information technology solutions to business, government, education and More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Cloud Engineer

London, South East, England, United Kingdom
Robert Walters
an essential role in supporting AWS public cloud infrastructure while championing automation through Infrastructure as Code solutions such as Terraform. Your day-to-day activities will involve collaborating with SRE and engineering teams to enhance system observability, proactively managing operational risks, maintaining high standards of security compliance, and ensuring robust disaster recovery capabilities. You will be responsible for documenting … Maintain the reliability and security of cloud environments by implementing robust monitoring tools and adhering to industry best practices.* Enhance observability and telemetry within cloud-hosted environments using SRE methodologies to deliver on Service Level Agreements (SLAs), Objectives (SLOs), and Indicators (SLIs).* Document and regularly review operational risks within the cloud environment, ensuring that identified issues are tracked … for all cloud-hosted services through effective backup strategies and disaster recovery processes, including planning and conducting quarterly DR tests.* Collaborate closely with Site Reliability Engineering (SRE) and engineering teams to ensure optimal management of the cloud environment.* Support asset management processes throughout their lifecycle, ensuring compliance with end-of-service (EOS) and end-of-life More ❯
Employment Type: Full-Time
Salary: £70,000 - £85,000 per annum
Posted:

Site Reliability Engineer

London, United Kingdom
Duffel
has helped build some of the world's largest companies. Our team in London is growing and we're looking for talented people to join us on our journey Engineering at Duffel We're building tools to simplify travel distribution, search and booking. What does this actually mean? It's one common and seamless API. This brings huge technical … experience to go with it. The tools used on the team include Elixir, Phoenix, Kubernetes and Google Cloud Platform. Site Reliability Engineering at Duffel As an SRE at Duffel, you'll be part of a small team within engineering that is responsible for the reliability, performance, and resilience of our infrastructure and applications. You will … be working closely with engineering teams to understand their needs and help meet the demands of our product as we scale globally. What we're looking for - An infrastructure and systems engineering generalist who is comfortable diving deep into the weeds on different issues. Some recent examples include: - A configuration issue between Google's Load Balancer and the More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Global Platform Team Lead and Senior Director - IT Security

London, United Kingdom
Boston Consulting Group
Core, BCG X, and CT worldwide. This role is also accountable for embedding security within DevSecOps practices, enforcing automation at scale, and applying Site Reliability Engineering (SRE) principles across all security services. The role requires strong partnership with ISRM, with a focus on balancing and prioritizing security requirements, automation opportunities, user experience needs, and broader business outcomes. … that support modern work scenarios, remote access, zero-trust networking, and AI/ML workloads. Leverage automation frameworks and IaC to improve scalability and reduce manual intervention. Operational Security, SRE & Assurance: Ensure security platforms are resilient, continuously monitored, and designed for 24x7 support and incident response readiness. Embed security telemetry and observability to enable proactive threat detection and automated response. … Apply SRE principles to improve reliability, performance, and maintainability of security services. Lead platform health, patching automation, and vulnerability remediation workflows. Define service level objectives (SLOs) and key performance indicators (KPIs) for all security services. Compliance, Governance & Risk Management: Ensure alignment with global compliance requirements such as ISO 27001, NIST, SOC 2, GDPR, and others. Partner with governance, legal More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Global Delivery Director - Secure Data

London, United Kingdom
Boston Consulting Group
enabling innovation and agility across BCG Core, BCG X, and CT worldwide. This role is accountable for embedding security within DevSecOps practices, applying Site Reliability Engineering (SRE) principles across all security services, and aligning with privacy, compliance, and business leaders to maintain trust and regulatory compliance. Key Responsibilities: Strategic Leadership & Transformation: Define and execute a unified security … remote access, zero-trust networking, and protection of sensitive data in AI/ML workloads. Leverage automation frameworks and IaC to improve scalability and reduce manual intervention. Operational Security, SRE & Assurance: Ensure security platforms are resilient, continuously monitored, and designed for 24x7 support and incident response readiness. Embed security telemetry and observability to enable proactive threat detection and automated response. … Apply SRE principles to improve reliability, performance, and maintainability of security services. Define service level objectives (SLOs) and key performance indicators (KPIs) for all security services. Compliance, Governance & Risk Management: Ensure alignment with global compliance requirements such as ISO 27001, NIST, SOC 2, GDPR, and others. Partner with governance, legal, and ISRM teams to implement enforceable policies and standards More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Director of Remote Connectivity

London, United Kingdom
Hybrid / WFH Options
Boston Consulting Group
software-defined networking principles. Embed zero-trust principles and user-centric design into all remote connectivity services. Align remote connectivity architecture with broader enterprise network, security, and cloud strategies. Engineering & Operations: Lead the engineering, deployment, and lifecycle management of remote access solutions such as Cisco AnyConnect, Zscaler, and other mainstream VPN … platforms. Drive automation of remote access provisioning, policy enforcement, and configuration management through Infrastructure as Code (IaC) and zero-touch deployment practices. Apply Site Reliability Engineering (SRE) principles to improve performance, availability, and troubleshooting. Establish observability practices across all access points with real-time metrics, logs, and telemetry. Security, Compliance & Governance: Ensure compliance with corporate security and … segmentation, and endpoint-based access control. Proven ability to scale remote connectivity solutions to tens of thousands of users and devices. Experience with IaC, network automation, observability tooling, and SRE methodologies. Preferred Qualifications: Certifications such as CCNP, CCIE, PCNSE, Zscaler Certified, or equivalent. Familiarity with secure hybrid work and cloud networking models. Background in network performance optimization, user-centric design More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Site Reliability Engineer (Golang)

London, United Kingdom
LinuxRecruit
Are you a seasoned Site reliability Engineer looking for an exciting new challenge? Join this team and transition into maintaining and enhancing the reliability of one of the world's largest platforms. In this role, you will utilise your expertise in Golang coding to develop robust applications, ensuring the systems remain resilient, scalable, and efficient. If you … presence and commitment to innovation, you will have the opportunity to work on projects that reach millions of users, making a real difference in the tech world. As a Site Reliability Engineer, you will be responsible for designing, developing, and maintaining systems and applications using Golang. You will monitor and optimise system performance with tools such as Grafana … Prometheus, New Relic, and Splunk. Your role will involve identifying and resolving reliability issues, automating processes, and ensuring the seamless operation of the platform. If you have a passion for technology and a drive to ensure excellence, we would love to hear from you More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Senior Site Reliability Engineer

London, United Kingdom
Board Intelligence
Mission We unleash the potential of organisations through the science of board effectiveness, building better businesses and benefiting society. The Opportunity As a Senior Site Reliability Engineer (SRE), you'll be joining a team whose mission is to ensure the availability, performance, security and reliability of our platform and core services, ensuring that they meet the needs … be responsible for visibility and monitoring of those systems, for building tooling and automation to reduce TOIL and for responding to incidents as part of our 24/7 SRE on-call team. The SRE team: Strives to provide the highest standards of Availability, Scalability, Performance and Security for our Software as a Service environments across multiple cloud vendors and … work Proactively monitors our platform and responds to incidents as part of a 24/7 rota Key responsibilities of the role We're looking for a great Senior SRE to be a hands on individual contributor to key technical projects and to help us build a first-class SRE function. This role will involve: Hands on work with technical More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Senior Application Support Engineer

London, United Kingdom
Just Group plc
become the UK's most loved retirement expert. Purpose As a Senior Application Support Engineer, you will play a crucial role in powering our Retail applications by partnering with engineering and business teams to build deep technical and business expertise. You'll be the go-to expert across a diverse, modern, and complex technology landscape, ensuring seamless support and … with a broad range of technologies, including: Practical experience with performance monitoring tools such as Dynatrace or equivalent. Skills & Knowledge Solid understanding of Site Reliability Engineering (SRE) principles, including incident management, monitoring, alerting, and performance tuning. Strong knowledge of Software Development Lifecycle (SDLC) processes. Familiarity with incident management platforms like ServiceNow, PagerDuty, or similar tools. Excellent analytical … e.g., annuities, equity release) is advantageous. Experience with automation and scripting to improve manual processes (e.g., PowerShell, Bash). Familiarity with agile methodologies and experience working in DevOps/SRE-driven environments. Company Benefits A Competitive Salary, Pension Scheme and Life Assurance Along with 25 Days Annual Leave plus an Additional Day on us for your Birthday Private Medical Cover More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Senior Cloud Security Engineer

London, South East, England, United Kingdom
Holland & Barrett International Limited
want to hear from you! Key Responsibilities: Security Strategy: Help define and execute the Holland & Barrett cloud security strategy, partnering with platform and Site Reliability Engineering (SRE) teams to build robust infrastructure that supports our business. Perimeter Security: Establish platform perimeter security by implementing controls at ingress and egress points, including creating and maintaining an edge network More ❯
Employment Type: Full-Time
Salary: Competitive salary
Posted:

Head of Product Operations

London, United Kingdom
Rewardgateway
Support teams to input into business reviews Be a visionary Ops champion for our internal teams Skills Bachelor's or Master's degree in a STEM field (Computer Science, Engineering, Mathematics, etc.) or equivalent experience Demonstrable experience in product management or product operations Strong product and technical background with proven ability to communicate effectively with engineers and technical team … management best practices-user research, market insights, goal setting, prioritisation, execution, and leadership Familiarity with monitoring tools, incident management protocols, and collaboration with Site Reliability Engineering (SRE) teams Proven ability to develop relationships and align teams across product, engineering, and leadership to ensure the effective execution of strategic priorities Hands-on experience analysing workflows and implementing … of improvement, develop solutions, and inspire change with autonomy The Interview Process Online interview with the Senior Talent Partner In-person interview with the Director of Product Operations and Engineering team member Online interview with Director of Product Operations and CPO At Reward Gateway Edenred, we are committed to ensuring an inclusive and accessible recruitment process for all candidates. More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Site Reliability Engineer

London, United Kingdom
Alokknight
We offer an exciting opportunity to join a world-class network team in a dynamic environment that feels like a start-up. As a Site Reliability Engineer (SRE) , you will deploy, manage, troubleshoot, and innovate the tools, services, and components that enable our network engineers to automate and maintain network operations. Your internal customers are your network engineering More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Site Reliability Engineer, Region Services

London, United Kingdom
Amazon
Overview Site Reliability Engineer, Region Services Job ID: AWS EMEA SARL (UK Branch) Would you like to help implement innovative cloud computing solutions and solve the most complex technical problems? Are you excited by the prospect of building and running the world's largest cloud computing infrastructure to provide a better world for future generations? AWS builds and … as part of our working culture. When we feel supported in the workplace and at home, there's nothing we can't achieve. Qualifications BASIC QUALIFICATIONS Knowledge of systems engineering fundamentals (networking, storage, operating systems) Experience (non-internship) in professional software development Experience designing or architecting (design patterns, reliability and scaling) of new and existing systems Experience in … networking, storage systems, operating systems and hands-on systems engineering Experience programming with at least one modern language such as C++, C#, Java, Python, Golang, PowerShell, Ruby Preferred Qualifications Experience with Ansible (preferred), Powershell or Javascript/Typescript Amazon is an equal opportunities employer. We believe passionately that employing a diverse workforce is central to our success. We make More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Mid/Senior DevOps Engineer

London, United Kingdom
Intelmatix
applications, delivering scalable, secure, and data-driven solutions to global clients. Role Overview: We are looking for a highly motivated Mid/Senior DevOps Engineer to join our Platform Engineering team. This role plays a critical part in shaping and supporting the infrastructure that powers our data and AI-driven platforms. You will work closely with engineers, data scientists … cloud-native solutions, and enabling the deployment of complex applications, including AI/ML models. Key Responsibilities: Maintain and optimize our cloud infrastructure (primarily AWS) with a focus on reliability, scalability, and cost efficiency. Automate infrastructure provisioning using Infrastructure-as-Code (IaC) tools such as Terraform. Build and maintain CI/CD pipelines for application, data, and model deployment … workflows. Collaborate with engineering and data science teams to deploy and monitor machine learning models and analytical services. Implement and enforce security best practices across cloud and network environments. Troubleshoot deployment and performance issues across multiple environments. Set up and maintain observability tools for logging, monitoring, and alerting (e.g., Prometheus, Grafana, Loki). Contribute to internal tooling to streamline More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Senior Site Reliability Engineer - Monitoring and Observability

London, United Kingdom
Hybrid / WFH Options
Macquarie Bank Limited
Overview Senior Site Reliability Engineer - Monitoring and Observability. Our team is dedicated to running and uplifting the current environment to the NextGen IT Monitoring and Observability stage. We run and maintain enterprise-wide log analytics, monitoring, and observability services, ensuring optimal performance and customer satisfaction. What role will you play? As a Monitoring and Observability Engineer, you will … support for urgent incidents, triages, or maintenance activities. What you offer Experience in monitoring and log analytics 5+ years of experience administrating, supporting and implementing solutions on Splunk Product engineering and architecture experience Knowledge of AWS Cloud technologies Proficiency in Python/Java programming Strong team player with the ability to communicate effectively across a range of stakeholders What More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Google Cloud Architect

London, United Kingdom
WeAreTechWomen
Cloud Airgapped solutions. You will build expertise in deploying and operating these solutions at customer sites as well as internal reference implementations. Your expertise in Google Cloud architecture and engineering, combined with your leadership experience in guiding small teams, will ensure the successful delivery of robust and scalable cloud solutions for our enterprise clients. Minimum of 5 years of … Expertise in a wide range of Google Cloud products and services (Engine, App Engine, Cloud Storage, GKE, etc.) and broader IaaS solutions (Kubernetes, systems virtualization, etc.) Experience architecting and engineering technical cloud-based solutions to meet business and non-functional requirements Hands-on experience creating comprehensive technical documentation, including architecture diagrams, design specifications, and operational runbooks Experience implementing foundational … mentorship to junior team members Strong communication skills with the ability to articulate complex technical concepts to both internal and client technical, non-technical, and management stakeholders Experience in site reliability engineering or IT production systems operations including troubleshooting and debugging live incidents Excellent problem-solving abilities with demonstrable examples of implementing technical innovation or process improvements More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Dev Ops Engineer

London, South East, England, United Kingdom
Hybrid / WFH Options
C4S Search Ltd
alignment with security baselines Implement and maintain cloud security controls, including identity and access management Key Skills/experience required: 5+ years of professional experience in a DevOps or Site Reliability Engineering role Expert-level experience with Microsoft Azure and Azure DevOps Strong hands-on experience with Kubernetes in production environments Proficient with Helm for Kubernetes application More ❯
Employment Type: Full-Time
Salary: £70,000 - £80,000 per annum
Posted:

Principal Engineer - CIAM XDP

London, UK
Barclays Bank PLC
transforming and modernising our digital estate to build a market-leading digital offering with customer experience at its heart.This is an exciting and key role, partnering with business aligned engineering and product teams, to ensure a collaborative team culture is at the heart of what we do.To be successful in this role you should have:Strong hands-on experience … and running of ForgeRock COTS based IAM solutions (PingGateway, PingAM, PingIDM, PingDS), including designing and implementing cloud-based, scalable and resilient IAM solutions for large corporate organisations.Experience with IAM engineering experience across authentication, authorisation, single sign-on, multi-factor authentication, identity lifecycle management, OAuth2.0, OpenID Connect, SAML and policy managementExpertise with JavaScript, Java, Python, and must be comfortable with … API and microservices development.Strong working knowledge of Site Reliability Engineering principlesExperience with Cloud computing (AWS is essential, Azure is a plus)Some other highly desirable skills include:Experience in DevSecOps - knowledge of Product Operating ModelKnowledge of Infrastructure as a Code tooling (Chef is essential, Ansible is a plus), containerizationknowledge of authentication and biometric system design is highly More ❯
Posted:

Counterparty Risk Analyst - Middle Office

London Area, United Kingdom
Lorien
with SQL and Python Data Visualisation skills with PowerBI, other Automation and Metrics knowledge handy. Proficiency with tools like Jira, Confluence, Excel, and SharePoint Familiarity with Agile, DevOps, and Site Reliability Engineering Excellent communication and stakeholder management skills More ❯
Posted:

Counterparty Risk Analyst - Middle Office

City of London, London, United Kingdom
Lorien
with SQL and Python Data Visualisation skills with PowerBI, other Automation and Metrics knowledge handy. Proficiency with tools like Jira, Confluence, Excel, and SharePoint Familiarity with Agile, DevOps, and Site Reliability Engineering Excellent communication and stakeholder management skills More ❯
Posted:
Site Reliability Engineering
London
10th Percentile
£68,125
25th Percentile
£81,563
Median
£92,500
75th Percentile
£114,063
90th Percentile
£127,500