excellence Develop and implement strategic plans to enhance the reliability, scalability, and efficiency of our infrastructure Collaborate with cross-functional teams to align SRE initiatives with broader organizational goals Establish and maintain SLIs, SLOs, and SLAs for critical systems and services Drive the adoption of best practices in automation … and management solution that helps organizations harness AI's potential while ensuring governance, security, compliance, and control. Experience Requirements: Proven experience in a senior SRE role or similar. Strong knowledge of cloud technologies and SLA SLO SLI management. Experience leading teams and implementing SCRUM processes. Excellent communication and leadership skills. … Experience line managing, mentoring, and coaching. Responsibilities: Collaborate with the Principal SRE to shape and implement the SRE strategic plan. Lead the SRE team in translating strategy into actionable plans, coordinating these through the SCRUM process. Address wellbeing and performance concerns, fostering a positive and productive team environment. Work with More ❯
Cambridge, Cambridgeshire, United Kingdom Hybrid / WFH Options
AI Tech Suite
excellence Develop and implement strategic plans to enhance the reliability, scalability, and efficiency of our infrastructure Collaborate with cross-functional teams to align SRE initiatives with broader organizational goals Establish and maintain SLIs, SLOs, and SLAs for critical systems and services Drive the adoption of best practices in automation … and management solution that helps organizations harness AI's potential while ensuring governance, security, compliance, and control. Experience Requirements: Proven experience in a senior SRE role or similar. Strong knowledge of cloud technologies and SLA SLO SLI management. Experience leading teams and implementing SCRUM processes. Excellent communication and leadership skills. … Experience line managing, mentoring, and coaching. Responsibilities: Collaborate with the Principal SRE to shape and implement the SRE strategic plan. Lead the SRE team in translating strategy into actionable plans, coordinating these through the SCRUM process. Address wellbeing and performance concerns, fostering a positive and productive team environment. Work with More ❯
of Capital One's ambitions. We are keen to add a Senior SiteReliabilityEngineering Manager (SSREM) to our Nottingham based SRE organisation whose primary focus is to provide effective leadership as we evolve and mature sitereliability practices for the benefit of our cloud … applications and their customers. The successful candidate will be a leader of leaders with custodianship of application services across 5+ SRE teams. We're looking for an experienced professional whose technical background allows effective challenge and support of teams managing primarily Java based applications running in a dynamic IaaC AWS … outcomes in the pursuit of business, functional and personal goals. The successful application will lead by example, build strong and valuable relationships within the SRE org, wider tech and business stakeholders. They have the ability to face ambiguity and understand how to make sense of complexity, importantly being able to More ❯
our collective success? We seek a talented Engineering Lead to join our dynamic team and lead our SiteReliabilityEngineering (SRE) function. This role ensures our systems are reliable and scalable, directly impacting user satisfaction. By integrating SRE activities across teams, you'll foster collaboration and … alignment with senior management will keep us competitive and innovative, driving collective success. What will you do in this role? Oversee and manage the SREengineering team to ensure continuous improvement in reliability, scalability, ensuring conformance to our security standards. Lead the integration of SRE activities across Application … Computer Science, or a related field. Proven experience in a leadership role within an engineering team. Strong technical background with expertise in DevSecOps, SRE, Agile Excellent technical and organizational skills. Strong problem-solving abilities and attention to detail. What we prefer you to have: Effective communication and interpersonal skills. More ❯
Global Services Provider who are looking for a SiteReliability Engineer to join their team where you will manage and guide the SRE team. As a SiteReliability Engineer, you will be responsible for: Manage and guide the SiteReliabilityEngineering (SRE) team … while promoting SRE principles throughout FCA product groups. Serve as the SRE subject matter expert and strategic lead within the delivery organization. Oversee and advise on daily operations related to observability tools, including their upkeep and optimization. Provide hands-on support to engineering teams for delivering observability initiatives, as … the reliability of quality assurance results. Proven skills and experience to help you succeed in this role: Strong Experience with primary role of SRE Engineer Strong experience in Devops Tools (Git Hub, Git Hub Actions, Workflow, CodeQL Jenkins, Nexus, CloudFormation/Terraform etc.) Strong experience in monitoring tool (Datadog More ❯
Job Title: SiteReliabilityEngineering (SRE) Lead – Observability Location: Stratford, London (Hybrid – 2 days per week onsite) Contract Length: 6 months Rate: £450–£500 per day (Inside IR35) Industry: Financial Services A leading Financial Services organisation in London is seeking a SiteReliabilityEngineering (SRE) Lead – Observability to join their team on a 6-month contract. This is a hybrid role requiring two days per week onsite at their Stratford, London offices. The role sits Inside IR35 . Key Responsibilities: Lead the SRE Observability team and champion observability practices across multiple product groups. … creation and QA of project-level Observability Plans. Input into and assure the quality of testing strategies and results. Requirements Proven experience in an SRE role with a strong focus on Observability. Expert-level proficiency with DevOps tools including GitHub, GitHub Actions, Jenkins, Nexus, CloudFormation/Terraform, and CodeQL. Extensive More ❯
london, south east england, united kingdom Hybrid / WFH Options
MarkJames Search
Job Title: SiteReliabilityEngineering (SRE) Lead – Observability Location: Stratford, London (Hybrid – 2 days per week onsite) Contract Length: 6 months Rate: £450–£500 per day (Inside IR35) Industry: Financial Services A leading Financial Services organisation in London is seeking a SiteReliabilityEngineering (SRE) Lead – Observability to join their team on a 6-month contract. This is a hybrid role requiring two days per week onsite at their Stratford, London offices. The role sits Inside IR35 . Key Responsibilities: Lead the SRE Observability team and champion observability practices across multiple product groups. … creation and QA of project-level Observability Plans. Input into and assure the quality of testing strategies and results. Requirements Proven experience in an SRE role with a strong focus on Observability. Expert-level proficiency with DevOps tools including GitHub, GitHub Actions, Jenkins, Nexus, CloudFormation/Terraform, and CodeQL. Extensive More ❯
site About this Opportunity Great opportunity for a Senior SiteReliability Engineer to join our Financial Wellbeing Platform. As a Senior SRE you'll be responsible for ensuring our products run reliably, are scalable, and perform optimally in production environments. You'll monitor and manage these aspects … What You'll Need Strong understanding of SiteReliabilityEngineering with commercial experience in working in a relevant environment and putting SRE principles into practice. Stakeholder management experience and the ability to guide and consult engineering teams Strong DevOps understanding, including experience of Infrastructure as Code … and CI/CD pipelines, such as Terraform and Jenkins, or alternatives such as GCP Cloud SRE experience and broad set of relevant product knowledge Knowledge of SLAs, SLOs and SLIs is essential along with the best practices for defining and implementing them. Confidence and capability to communicate complex technical More ❯
SiteReliability Engineer London - 3 days in office mandatory Full-time permanent Up to £90,000 + 5 annual performance related bonus We have an exciting new opportunity for a SiteReliability Engineer to Join Robert Walters as a Consultant. As an Employed Consultant, you will … rapidly growing field in technology and applying your skillsets to drive innovation and modernize the world's most complex and mission-critical systems! As SRE you will join a Global Investment Bank within the SRE Payments Technology Team in the Corporate & Investment Bank line of business, you will solve complex … and iteratively improve on existing solutions. You are a significant contributor to your team by sharing your knowledge of end-to-end operations, availability, reliability, and scalability of your application or platform. Job responsibilities: * Guides and assists others in the areas of building appropriate level designs and gaining consensus More ❯
Preferred Qualifications: Master's degree in Computer Science or Engineering, or a related field. About the Job SiteReliabilityEngineering (SRE) combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. SRE ensures that Google Cloud's services-both … our systems capacity and performance. Much of our software development focuses on optimizing existing systems, building infrastructure and eliminating work through automation. On the SRE team, you'll have the opportunity to manage the complex challenges of scale which are unique to Google Cloud, while using your expertise in coding … algorithms, complexity analysis and large-scale system design. SRE's culture of diversity, intellectual curiosity, problem solving and openness is key to its success. Our organization brings together people with a wide variety of backgrounds, experiences and perspectives. We encourage them to collaborate, think big and take risks in a More ❯
automate routine tasks. Systematic problem-solving approach, coupled with effective verbal and written communication skills. About the Job SiteReliabilityEngineering (SRE) combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. SRE ensures that Google Cloud's services-both … our systems capacity and performance. Much of our software development focuses on optimizing existing systems, building infrastructure and eliminating work through automation. On the SRE team, you'll have the opportunity to manage the complex challenges of scale which are unique to Google Cloud, while using your expertise in coding … algorithms, complexity analysis, and large-scale system design. SRE's culture of diversity, intellectual curiosity, problem solving, and openness is key to its success. Our organization brings together people with a wide variety of backgrounds, experiences, and perspectives. We encourage them to collaborate, think big, and take risks in a More ❯
Manchester, Lancashire, United Kingdom Hybrid / WFH Options
bet365 Group
A SiteReliability Engineer, who will enhance system reliability, observability and performance through a strong engineering approach and assist with incident resolution and best practices. You will have software engineering skills, focusing on system reliability and observability. You will monitor the health, performance and … availability of critical systems, directly impacting operational efficiency. Using your engineering expertise, you will implement solutions that enhance reliability, including service instrumentation with tools such as Open Telemetry, improve logging practices and develop features for maintainability. You will also help engineer tools and automation for effective service management. … Collaboration is key, working across multiple functions to integrate reliability and observability best practices into the software development life cycle. By supporting governance standards set by the central teams, you will foster a culture where these principles are integral to development. Your contributions will ensure our systems meet user More ❯
Cambridge, Cambridgeshire, East Anglia, United Kingdom
RedTech Recruitment
game-changing technology within their industry, with exciting scope for expansion into further industries. This role is looking for someone to work within the SRE team responsible for incident response and issue resolution. Location: Cambridge Salary: £32,000 £60,000 + excellent benefits (£32,000 for a new Graduate) Requirements … of problem solving identifying the root causes of issues. Good logical reasoning Responsibilities for SiteReliability Engineer Graduate Considered: Working within the SRE team you will be responsible for the architecture of a mission-critical cloud platform for an industry-leading software company. You will be diagnosing issues … has been removed by the job-board, full details for contact are available on our website). Keywords- SiteReliability Engineer/SRE/DevOps/Software Engineering/Software Development/Engineering/Physics/Astrophysics/Python/Computer science/Cloud/Mathematics More ❯
Manchester Area, United Kingdom Hybrid / WFH Options
bet365
Who we are looking for A SiteReliability Engineer, who will enhance system reliability, observability and performance through a strong engineering approach and assist with incident resolution and best practices. You will have software engineering skills, focusing on system reliability and observability. You will … monitor the health, performance and availability of critical systems, directly impacting operational efficiency. Using your engineering expertise, you will implement solutions that enhance reliability, including service instrumentation with tools such as Open Telemetry, improve logging practices and develop features for maintainability. You will also help engineer tools and … automation for effective service management. Collaboration is key, working across multiple functions to integrate reliability and observability best practices into the software development life cycle. By supporting governance standards set by the central teams, you will foster a culture where these principles are integral to development. Your contributions will More ❯
Stoke-On-Trent, England, United Kingdom Hybrid / WFH Options
bet365
Who we are looking for A SiteReliability Engineer, who will enhance system reliability, observability and performance through a strong engineering approach and assist with incident resolution and best practices. You will have software engineering skills, focusing on system reliability and observability. You will … monitor the health, performance and availability of critical systems, directly impacting operational efficiency. Using your engineering expertise, you will implement solutions that enhance reliability, including service instrumentation with tools such as Open Telemetry, improve logging practices and develop features for maintainability. You will also help engineer tools and … automation for effective service management. Collaboration is key, working across multiple functions to integrate reliability and observability best practices into the software development life cycle. By supporting governance standards set by the central teams, you will foster a culture where these principles are integral to development. Your contributions will More ❯
Press Tab to Move to Skip to Content Link Select how often (in days) to receive an alert: SiteReliability Engineer (SRE) - Consultant - Digital Factory At Capgemini Invent, we believe difference drives change. As inventive transformation consultants, we blend our strategic, creative and scientific capabilities, collaborating closely with … science and data. Superpowered by creativity and design. All underpinned by technology created with purpose. YOUR ROLE As a SiteReliability Engineer (SRE), you will play a key role in ensuring the reliability, scalability, and efficiency of our clients' platforms. Your focus will include building strong observability … practices, aligning with the SRE mindset & principles, and driving continuous improvement. This will involve: Defining and implementing Service Level Indicators (SLIs) and Service Level Objectives (SLOs) to measure and maintain system and application performance, ensuring services meet agreed reliability targets. Instrumenting applications to collect key metrics, logs, and traces More ❯
us! Job Description: This job is responsible for partnering with engineering and technology teams to implement measures as prescribed by lead/senior SRE engineers. Key responsibilities include ensuring appropriate instrumentation, tooling, ticketing, alerting and on-call routines are in place for key services, identifying root causes of issues … reliability scripts, tools and libraries and leverages them for common instrumentation, automation, and operational needs, and when mentoring SiteReliability Engineer (SRE) resources on reliability practices and established tools/capabilities. Collaborates with Development and Infrastructure teams to understand technical solutions and implement monitoring capabilities outlined … in the application and system monitoring designs put forward by the SRE Lead. Partners to implement code changes to make use of common reliability libraries and tools and helps Application Production Services and Application Development teammates understand how to use them. Identifies vulnerabilities and opportunities for reliability improvement More ❯
Lead Cloud Infrastructure and SiteReliability Engineer Brand: HSBC Area of Interest: Technology Location: Birmingham, GB, B1 1HQ Work style: Office Worker Date: 24 Apr 2025 Join a digital-first bank that's powered by people. Our technology team builds innovative digital solutions rapidly and at scale to …/Infrastructure Security. Your work will provide assurance of the effectiveness of security controls to Business Risk Owners. The Lead Cybersecurity Analytics Cloud Infrastructure & SiteReliability Engineer will be part of the CSA Platform & Data Engineering Team, joining a global team of data technology professionals to deliver … Availability, Resiliency). To be successful in this role, you should meet the following requirements: Strong understanding of SiteReliabilityEngineering (SRE) principles and hands-on experience with Azure DevOps. Proficient in scripting (Bash, PowerShell, Azure CLI), coding (Python, C#, Java), and querying (SQL, Kusto Query Language More ❯
Select how often (in days) to receive an alert: We are seeking a highly skilled and proactive Oracle SiteReliability Engineer (SRE) to ensure the reliability, performance, and scalability of our critical Oracle-based applications and services supporting a global user base. The ideal candidate will possess … deep expertise in Oracle technologies and SRE methodologies. You will be responsible for ensuring the stability and efficiency of our Oracle systems, implementing automation, managing patching, and providing expert-level support to our global users. Strong cross-functional collaboration and a proactive approach to problem-solving are essential for success … troubleshoot IT systems. Leverage cloud platforms and automation tools to enhance scalability and efficiency. Ensure compliance with IT standards and regulations. Apply knowledge of SRE (SiteReliabilityEngineering) and/or DEVOPS practices to improve system reliability and performance. Maintain UK Security Clearance BPSS (Baseline Personnel More ❯
and ensure Morrisons’ applications and infrastructure are resilient, efficient, and aligned with architectural goals. This is a key role for those passionate about advancing SRE practices at enterprise scale. Responsibilities Act as SME within their Domain teams for advice & guidance in terms of CI/CD, automation and product ways … of working and SRE/Engineering standards Drive the adoption of Engineering standards and Continuous Delivery principles within multiple domains The escalation point for SRE/Engineering ways of working Influence good practices and standards within SDLC throughout the business Influence partners Infrastructure best practices Implementation of … and patterns Engineering Tooling, Patterns, Framework and Standards Proprietary code quality management inclusive of technical debt About you Knowledge In depth understanding of SRE/Engineering, Architecture and Testing practices In depth understanding of the principals of CI/CD within SRE/Engineering In depth understanding More ❯
and ensure Morrisons’ applications and infrastructure are resilient, efficient, and aligned with architectural goals. This is a key role for those passionate about advancing SRE practices at enterprise scale. Responsibilities Act as SME within their Domain teams for advice & guidance in terms of CI/CD, automation and product ways … of working and SRE/Engineering standards Drive the adoption of Engineering standards and Continuous Delivery principles within multiple domains The escalation point for SRE/Engineering ways of working Influence good practices and standards within SDLC throughout the business Influence partners Infrastructure best practices Implementation of … and patterns Engineering Tooling, Patterns, Framework and Standards Proprietary code quality management inclusive of technical debt About you Knowledge In depth understanding of SRE/Engineering, Architecture and Testing practices In depth understanding of the principals of CI/CD within SRE/Engineering In depth understanding More ❯
The SiteReliabilityEngineering (SRE) team at Pendo is responsible for provisioning and maintaining cloud infrastructure from development through production for all product initiatives, and working with developers and product managers to ensure that our products are not only reliable and performant, but also cost-efficient. Our … on-call and incident management functions, supporting a high-throughput platform which processes more than 15 billion events per day. To ensure the reliability of this environment for our customers, SREs work closely with developers and product managers to understand service level objectives, think through failures scenarios, and design … systems which balance cost with reliability objectives. Additionally, SREs collaborate with the Information Security team to ensure that cloud infrastructure is properly secured, and that sufficient controls are in place to meet our compliance goals with respect to industry standards such as SOC 2. Role Responsibilities Write high-quality More ❯
SR2 | Socially Responsible Recruitment | Certified B Corporation™
Cloud SRE (SiteReliability Engineer) 2 days a week onsite in Bristol Up to £105,000 depending on experience Brilliant Benefits Package One of SR2’s standout clients is on the lookout for a skilled Cloud SRE (SiteReliability Engineer) to join their growing team as … generation of offerings – from 2025 and beyond! Why they are hiring: This role has opened up as a result of continued growth within the engineering team, creating the need for this role. You’ll be an active and leading … member of a cloud-focussed team of super talented engineers working with the teams within the business to influence and drive the adoption of SRE best practices and ways of working specifically within microservices. You will ideally be from an engineering background – this is a through and through engineeringMore ❯
Manchester, Lancashire, United Kingdom Hybrid / WFH Options
Embarcaderomediagroup
SiteReliability & Platform Engineer to help lead the way. You'll sit at the heart of our engineering operations, bringing together SRE principles and modern platform engineering practices. This includes combining principles of SRE - such as service-level reliability, observability, incident response - with platform engineering … ship faster, safer, and more cost-efficiently. What you'll be doing: Designing and operating highly reliable, scalable, and secure Azure-based platforms Applying SRE principles like SLOs, observability, and incident management to drive service reliability Building Infrastructure as Code using Terraform (v1.7+) and GitOps workflows Enabling teams through … for someone passionate about building robust infrastructure and enabling others to move faster and more securely. You might come from a cloud engineering, SRE, or DevOps background - what matters most is your curiosity, systems thinking, and drive to improve operational efficiency. At Sorted, we are committed to fostering an More ❯
is hybrid, which involves spending at least two days per week, or 40% of our time, at our Bristol office. About this opportunity Our SRE (SiteReliabilityEngineering) team within the Analytics & AI platform are looking for an experienced and passionate Engineer with strong hands-on development … experience. As an SRE you'll be working with a team of engineers on a suite of automation and gen AI products. You'll run and maintain a set of applications and services on a combination of Private and Public Clouds that will enable the business to realise the next … in our technologies, workplaces, and colleagues to make our Group a great place for everyone. Including you! What you'll need Strong practitioner in SRE principles (SLI, SLO & SLA) using Observability, Logging, Monitoring & Alerting Experience of Infrastructure as Code and CI/CD pipelines using tools such as Terraform, Jenkins More ❯