London, England, United Kingdom Hybrid / WFH Options
JR United Kingdom
Category: Other - EU work permit required: Yes col-narrow-right Job Views: 4 Posted: 31.05.2025 Expiry Date: 15.07.2025 col-wide Job Description: AWS Sitereliability lead/SRE Lead London UK 40% Hybrid working at the Stratford London Or Leeds/Rest is Full Remote Contract role - 6 Months + START ASAP (Must be eligible for a SC … clearance) Key Responsibilities: Leading the SRE team and Evangelise SRE Observability approach across product groups with the customer space. Providing the thought leadership of SRE from the delivery team Hands on experience in SRE Observability and leveraging Datadog as an observability tool (Infra and application) Review and guide the team for the day-to-day operations of observability tool and … maintenance of observability tools. Support engineering teams directly with delivery of their observability backlog on a demand basis. Key Skills and Experience: Strong Experience with primary role of SREEngineer Strong experience in Devops Tools (Git Hub, Git Hub Actions, Workflow, CodeQL Jenkins, Nexus, CloudFormation/Terraform etc.) Strong experience in monitoring tool (Datadog is preferred) Strong Knowledge of More ❯
London, England, United Kingdom Hybrid / WFH Options
RED Global
at the Stratford London Or Leeds/Rest is Full Remote Contract role - 6 Months + START ASAP (Must be eligible for a SC clearance) Key Responsibilities: Leading the SRE team and Evangelise SRE Observability approach across product groups with the customer space. Providing the thought leadership of SRE from the delivery team Hands on experience in SRE Observability and … and maintenance of observability tools. Support engineering teams directly with delivery of their observability backlog on a demand basis. Key Skills and Experience: Strong Experience with primary role of SREEngineer Strong experience in Devops Tools (Git Hub, Git Hub Actions, Workflow, CodeQL Jenkins, Nexus, CloudFormation/Terraform etc.) Strong experience in monitoring tool (Datadog is preferred) Strong Knowledge … Contract Job function Job function Information Technology Industries Staffing and Recruiting Referrals increase your chances of interviewing at RED Global by 2x Sign in to set job alerts for “SiteReliabilityEngineer” roles. London, England, United Kingdom £28,909.00-£28,909.00 2 hours ago West Drayton, England, United Kingdom 1 hour ago London, England, United Kingdom More ❯
Job Description Insight Global is seeking an Operations SiteReliabilityEngineer to provide global operational support for a leading infrastructure software company … s customer-facing SaaS products. You will join a team of engineers demonstrating exceptional technical expertise, managing mission-critical infrastructure, and ensuring optimal availability (24x7x365), performance, and security. This SRE role involves monitoring, maintaining, and enhancing the availability and performance of production services. Responsibilities include driving automation to minimize failures and manual tasks, supporting stakeholder requests within agreed SLAs, and More ❯
Insight Global is looking for an Operations SiteReliabilityEngineer to help with global operational support for a leading infrastructure software product company’s customer-facing SaaS products. You … will be part of a team of engineers that demonstrates superb technical competency, operates mission-critical infrastructure and ensures the highest levels of availability (24x7x365), performance and security. This SRE would be part of the critical operations function that is responsible for the monitoring, availability and performance of production services. They would be driving automation to reduce failures, manual tasks More ❯
Join Barclays as a Senior SiteReliabilityEngineer and become part of our newly formed Core SRE Team. This team will establish a Centre of Excellence to enhance and promote SRE best practices across GTIS. As a key hire, you will play a pivotal role in raising awareness and driving adoption of SRE methodologies within various GTIS … across GTIS and CTO, engaging with storage, data, and other product teams. You will act as a trusted advisor, providing strategic guidance and consultative support to help teams improve reliability, scalability, and efficiency. To be successful in this role you should have: Proficiency in Programming and Scripting - This includes expertise in languages such as Python, Powershell, or Go, which … reliability at scale. Influential Communication Skills - The ability to communicate effectively with team members and stakeholders, ensuring alignment, inspiring and motivating them to embrace new mindsets, cultures, and SRE working practices. This skill is crucial for driving meaningful change and fostering a collaborative environment where innovative ideas can thrive. Some other highly valued skills include: Knowledge of Cloud Computing More ❯
Manchester, Lancashire, United Kingdom Hybrid / WFH Options
bet365 Group
A SiteReliabilityEngineer, who will enhance system reliability, observability and performance through a strong engineering approach and assist with incident resolution and best practices. You will have software engineering skills, focusing on system reliability and observability. You will monitor the health, performance and availability of critical systems, directly impacting operational efficiency. Using your engineering … expertise, you will implement solutions that enhance reliability, including service instrumentation with tools such as Open Telemetry, improve logging practices and develop features for maintainability. You will also help engineer tools and automation for effective service management. Collaboration is key, working across multiple functions to integrate reliability and observability best practices into the software development life cycle. … user demands and enhance overall service performance. This role is eligible for inclusion in the Company's hybrid working from home policy. Preferred skills and experience Excellent knowledge of SiteReliability Engineering principles, including the creation and management of effective Service Level Indicators (SLI) and Service Level Objectives (SLO) for reliability and customer satisfaction. Knowledge of contemporary More ❯
Manchester, England, United Kingdom Hybrid / WFH Options
bet365 Group
A SiteReliabilityEngineer, who will enhance system reliability, observability and performance through a strong engineering approach and assist with incident resolution and best practices. You will have software engineering skills, focusing on system reliability and observability. You will monitor the health, performance and availability of critical systems, directly impacting operational efficiency. Using your engineering … expertise, you will implement solutions that enhance reliability, including service instrumentation with tools such as Open Telemetry, improve logging practices and develop features for maintainability. You will also help engineer tools and automation for effective service management. Collaboration is key, working across multiple functions to integrate reliability and observability best practices into the software development life cycle. … user demands and enhance overall service performance. This role is eligible for inclusion in the Company’s hybrid working from home policy. Preferred skills and experience Excellent knowledge of SiteReliability Engineering principles, including the creation and management of effective Service Level Indicators (SLI) and Service Level Objectives (SLO) for reliability and customer satisfaction. Knowledge of contemporary More ❯
Stoke-on-Trent, England, United Kingdom Hybrid / WFH Options
JR United Kingdom
Who we are looking for A SiteReliabilityEngineer who will enhance system reliability, observability, and performance through a strong engineering approach, and assist with incident resolution and best practices. You will have software engineering skills, focusing on system reliability and observability. You will monitor the health, performance, and availability of critical systems, directly impacting … operational efficiency. Using your engineering expertise, you will implement solutions that enhance reliability, including service instrumentation with tools such as Open Telemetry, improve logging practices, and develop features for maintainability. You will also help engineer tools and automation for effective service management. Collaboration is key, working across multiple functions to integrate reliability and observability best practices into … user demands and enhance overall service performance. This role is eligible for inclusion in the company’s hybrid working from home policy. Preferred skills and experience Excellent knowledge of SiteReliability Engineering principles, including creating and managing effective Service Level Indicators (SLI) and Service Level Objectives (SLO) for reliability and customer satisfaction. Knowledge of contemporary observability tools More ❯
Stafford, England, United Kingdom Hybrid / WFH Options
JR United Kingdom
Social network you want to login/join with: A SiteReliabilityEngineer, who will enhance system reliability, observability and performance through a strong engineering approach and assist with incident resolution and best practices. You will have software engineering skills, focusing on system reliability and observability. You will monitor the health, performance and availability of … critical systems, directly impacting operational efficiency. Using your engineering expertise, you will implement solutions that enhance reliability, including service instrumentation with tools such as Open Telemetry, improve logging practices and develop features for maintainability. You will also help engineer tools and automation for effective service management. Collaboration is key, working across multiple functions to integrate reliability and … user demands and enhance overall service performance. This role is eligible for inclusion in the Company’s hybrid working from home policy. Preferred skills and experience Excellent knowledge of SiteReliability Engineering principles, including the creation and management of effective Service Level Indicators (SLI) and Service Level Objectives (SLO) for reliability and customer satisfaction. Knowledge of contemporary More ❯
Manchester, England, United Kingdom Hybrid / WFH Options
bet365
Direct message the job poster from bet365 A SiteReliabilityEngineer, who will enhance system reliability, observability and performance through a strong engineering approach and assist with incident resolution and best practices. You will have software engineering skills, focusing on system reliability and observability. You will monitor the health, performance and availability of critical systems … directly impacting operational efficiency. Using your engineering expertise, you will implement solutions that enhance reliability, including service instrumentation with tools such as Open Telemetry, improve logging practices and develop features for maintainability. You will also help engineer tools and automation for effective service management. Collaboration is key, working across multiple functions to integrate reliability and observability best … user demands and enhance overall service performance. This role is eligible for inclusion in the Company’s hybrid working from home policy. Preferred skills and experience Excellent knowledge of SiteReliability Engineering principles, including the creation and management of effective Service Level Indicators (SLI) and Service Level Objectives (SLO) for reliability and customer satisfaction. Knowledge of contemporary More ❯
Edinburgh, Scotland, United Kingdom Hybrid / WFH Options
Canonical
times yearly in person, in interesting locations around the world, to align on strategy and execution. The company is founder led, profitable and growing. We are hiring a Senior SiteReliabilityEngineer Next-gen operations at scale, with pure Python infra-as-code, from bare metal to containers and applications. Our goal is to perfect enterprise infrastructure … Kubernetes and software defined storage, and we enable devsecops for applications running on that infrastructure too. To become a member of this team, you need to be a software engineer fluent in Python, you need a genuine interest in the full open source infrastructure stack from metal to containers, and you need the ability to work in a high … type Full-time Job function Job function Engineering and Information Technology Industries Software Development Referrals increase your chances of interviewing at Canonical by 2x Get notified about new Senior SiteReliabilityEngineer jobs in Edinburgh, Scotland, United Kingdom . Senior SiteReliability/Gitops Engineer Edinburgh, Scotland, United Kingdom 1 month ago Edinburgh, Scotland More ❯
LSEG City Of London, England, United Kingdom Join or sign in to find your next job Join to apply for the CDSClear IT SiteReliabilityEngineer role at LSEG Continue with Google Continue with Google LSEG City Of London, England, United Kingdom Join to apply for the CDSClear IT SiteReliabilityEngineer role at … LSEG The role is for a business focused SiteReliabilityEngineer with solid experience in IT environments, tools, and technologies. The individual filling this role will be a business focused problem solver with a desire to learn and closely partner with highly engaged production Risk, Operations, BDRM (Business Development & Relationship Management) and IT Development teams to maintain … model for supporting applications in AWS Work closely with the business line and IT colleagues to progress the CDSClear project agenda Out of hours on-call rota support The SiteReliabilityEngineer will be a member of rota for late shift work until 10 pm London Time. Subject to changes as the global support model evolves. Active More ❯
Senior SiteReliabilityEngineer - Monitoring and Observability Senior SiteReliabilityEngineer - Monitoring and Observability Macquarie Group London, United Kingdom Apply now Posted 9 hours ago Permanent Competitive Our team is dedicated to running and uplifting the current environment to the NextGen IT Monitoring and Observability stage. We run and maintain enterprise-wide log analytics … be part of a friendly and supportive team where everyone - no matter what role - contributes ideas and drives outcomes. What role will you play? As a Monitoring and Observability Engineer, you will run and maintain enterprise-wide log analytics, monitoring, and observability services. You will be responsible for improving the value provided by the log analytics platform to drive More ❯
London, England, United Kingdom Hybrid / WFH Options
Canonical
four times yearly in person, in interesting locations around the world, to align on strategy and execution. The company is founder led, profitable and growing. We are hiring a SiteReliabilityEngineer Next-gen operations at scale, with pure Python infra-as-code, from bare metal to containers and applications. Our goal is to perfect enterprise infrastructure … Kubernetes and software defined storage, and we enable devsecops for applications running on that infrastructure too. To become a member of this team, you need to be a software engineer fluent in Python, you need a genuine interest in the full open source infrastructure stack from metal to containers, and you need the ability to work in a high More ❯
along the way! Job Summary We have built Curve Dental into an industry-leading provider of beautiful cloud software for the dental industry. Who We're Looking For Our SiteReliability Engineers (SREs) are passionate about automation and its power to streamline the deployment and operation of software. They collaborate closely with developers to support a wide range More ❯
along the way! Job Summary We have built Curve Dental into an industry-leading provider of beautiful cloud software for the dental industry. Who We’re Looking For Our SiteReliability Engineers (SREs) are passionate about automation and its power to streamline the deployment and operation of software. They collaborate closely with developers to support a wide range More ❯
London, England, United Kingdom Hybrid / WFH Options
NatWest Group
such as availability, performance, efficiency, change management, monitoring, security, incident response, and capacity planning of our products and services You'll be leading team(s) of talented DevSecOps and SRE engineers, working with new and innovative technology to deliver high impact solutions You’ll enjoy significant stakeholder interaction, working in collaboration with engineers to ensure a principled approach to deliver … wider team. In addition to this, you’ll: Have people management responsibilities and support recruitment, management of talent and performance Own and create technical road map for DevSecOps and SRE with the right architecture, solutions & commercial value Own security automation across our entire platform, collaborating with security teams to ensure platform integrity Own the observability strategy, and deliver monitoring and … to establish the risk tolerance of products and services The skills you'll need We’re looking for someone with prior experience in establishing and running a DevSecOps and SRE practice . Possesses strong knowledge of reliability systems thinking and experience of software engineering. You’ll need experience of using a data driven and scientific approach to fact finding. More ❯
SRE/DevOps Engineer – High Frequency Trading - Multi Strategy Hedge Fund - Multi Billion Dollar Hedge Fund - Multiple Headcount - Open to Relocation - Up to £700k TC Join a leading multi-strategy hedge fund, where you’ll collaborate with elite engineers and top investment professionals to develop cutting-edge trading technology. We are seeking highly skilled SRE/DevOps Engineers with … across multiple assets globally. Build effective tooling for automation across all phases of SDLC, ensuring rigorous testing, release, and deployment processes. Improve trading systems' performance, monitoring, availability, and reliability . Provide day-to-day support for mission-critical trading applications and infrastructure. Develop a deep understanding of trading workflows and demonstrate excellent incident management skills . Communicate effectively with More ❯
SRE/DevOps Engineer – High Frequency Trading - Multi Strategy Hedge Fund - Multi Billion Dollar Hedge Fund - Multiple Headcount - Open to Relocation - Up to £700k TC Join a leading multi-strategy hedge fund, where you’ll collaborate with elite engineers and top investment professionals to develop cutting-edge trading technology. We are seeking highly skilled SRE/DevOps Engineers with … across multiple assets globally. Build effective tooling for automation across all phases of SDLC, ensuring rigorous testing, release, and deployment processes. Improve trading systems' performance, monitoring, availability, and reliability . Provide day-to-day support for mission-critical trading applications and infrastructure. Develop a deep understanding of trading workflows and demonstrate excellent incident management skills . Communicate effectively with More ❯
London, England, United Kingdom Hybrid / WFH Options
Docebo
what are you waiting for? Join 900+ Docebians around the world and help us reinvent the way people learn. About This Opportunity: As a Senior Cloud Operations and Support Engineer, you’ll help manage the operational health of the Docebo learning platform, solving critical customer-facing issues by orchestrating the broader response of the Docebo product organization. You will More ❯
strong, diversified investment bank. We are growth-oriented, people-focused, and community-minded. As a team, we work to deliver value for our clients every day. Position Overview The Engineer II is critical to the success of the Digital Client strategy, the digital services we provide to our customers, and supporting our cloud and data strategy. The role forms More ❯
Watford, Hertfordshire, South East, United Kingdom
La Fosse
SiteReliabilityEngineer £70,000 pa Hertfordshire My client, a leading entertainment group, are looking for a mid level SRE to join their platform team in their Hertfordshire office. In the role you'll take ownership of the end-to-end monitoring and alerting stack, designing and maintaining infrastructure and alert configurations (e.g., with Prometheus/Grafana More ❯
grow, and make an impact. Join us! Job Description: This job is responsible for partnering with engineering and technology teams to implement measures as prescribed by lead/senior SRE engineers. Key responsibilities include ensuring appropriate instrumentation, tooling, ticketing, alerting and on-call routines are in place for key services, identifying root causes of issues through production triage efforts, and … Responsibilities: Develops and maintains reliability scripts, tools and libraries and leverages them for common instrumentation, automation, and operational needs, and when mentoring SiteReliabilityEngineer (SRE) resources on reliability practices and established tools/capabilities. Collaborates with Development and Infrastructure teams to understand technical solutions and implement monitoring capabilities outlined in the application and system … monitoring designs put forward by the SRE Lead. Partners to implement code changes to make use of common reliability libraries and tools and helps Application Production Services and Application Development teammates understand how to use them. Identifies vulnerabilities and opportunities for reliability improvement, such as investigating low level error rates and 'noise' in monitoring, and defines solutions to More ❯
grow, and make an impact. Join us! Job Description: This job is responsible for partnering with engineering and technology teams to implement measures as prescribed by lead/senior SRE engineers. Key responsibilities include ensuring appropriate instrumentation, tooling, ticketing, alerting and on-call routines are in place for key services, identifying root causes of issues through production triage efforts, and … Responsibilities: Develops and maintains reliability scripts, tools and libraries and leverages them for common instrumentation, automation, and operational needs, and when mentoring SiteReliabilityEngineer (SRE) resources on reliability practices and established tools/capabilities. Collaborates with Development and Infrastructure teams to understand technical solutions and implement monitoring capabilities outlined in the application and system … monitoring designs put forward by the SRE Lead. Partners to implement code changes to make use of common reliability libraries and tools and helps Application Production Services and Application Development teammates understand how to use them. Identifies vulnerabilities and opportunities for reliability improvement, such as investigating low level error rates and 'noise' in monitoring, and defines solutions to More ❯
SiteReliabilityEngineer (SRE) Manager - Apple Services Engineering London, England, United Kingdom Software and Services Description Apple Service Engineering (ASE)'s Compute team is seeking highly motivated individual with strong technical and communication skills to join us in on our quest to build and enhance massive clusters hosting Virtual Machines, Containers and associated infrastructure that can scale … engage with the upstream community to drive Apple's requirements. Ultimately, you will help build the platform that delivers our applications at scale to our end users.As a Compute SiteReliability Engineering manager, you will be leading a team responsible for providing the platform for mission-critical cloud systems to maintain constant uptime, scale seamlessly, and allow for More ❯