of Capital One's ambitions. We are keen to add a Senior SiteReliabilityEngineering Manager (SSREM) to our Nottingham based SRE organisation whose primary focus is to provide effective leadership as we evolve and mature sitereliability practices for the benefit of our cloud … applications and their customers. The successful candidate will be a leader of leaders with custodianship of application services across 5+ SRE teams. We're looking for an experienced professional whose technical background allows effective challenge and support of teams managing primarily Java based applications running in a dynamic IaaC AWS … outcomes in the pursuit of business, functional and personal goals. The successful application will lead by example, build strong and valuable relationships within the SRE org, wider tech and business stakeholders. They have the ability to face ambiguity and understand how to make sense of complexity, importantly being able to More ❯
excellence Develop and implement strategic plans to enhance the reliability, scalability, and efficiency of our infrastructure Collaborate with cross-functional teams to align SRE initiatives with broader organizational goals Establish and maintain SLIs, SLOs, and SLAs for critical systems and services Drive the adoption of best practices in automation … and management solution that helps organizations harness AI's potential while ensuring governance, security, compliance, and control. Experience Requirements: Proven experience in a senior SRE role or similar. Strong knowledge of cloud technologies and SLA SLO SLI management. Experience leading teams and implementing SCRUM processes. Excellent communication and leadership skills. … Experience line managing, mentoring, and coaching. Responsibilities: Collaborate with the Principal SRE to shape and implement the SRE strategic plan. Lead the SRE team in translating strategy into actionable plans, coordinating these through the SCRUM process. Address wellbeing and performance concerns, fostering a positive and productive team environment. Work with More ❯
Cambridge, Cambridgeshire, United Kingdom Hybrid / WFH Options
AI Tech Suite
excellence Develop and implement strategic plans to enhance the reliability, scalability, and efficiency of our infrastructure Collaborate with cross-functional teams to align SRE initiatives with broader organizational goals Establish and maintain SLIs, SLOs, and SLAs for critical systems and services Drive the adoption of best practices in automation … and management solution that helps organizations harness AI's potential while ensuring governance, security, compliance, and control. Experience Requirements: Proven experience in a senior SRE role or similar. Strong knowledge of cloud technologies and SLA SLO SLI management. Experience leading teams and implementing SCRUM processes. Excellent communication and leadership skills. … Experience line managing, mentoring, and coaching. Responsibilities: Collaborate with the Principal SRE to shape and implement the SRE strategic plan. Lead the SRE team in translating strategy into actionable plans, coordinating these through the SCRUM process. Address wellbeing and performance concerns, fostering a positive and productive team environment. Work with More ❯
of Capital One's ambitions. We are keen to add a Senior SiteReliabilityEngineering Manager (SSREM) to our Nottingham based SRE organisation whose primary focus is to provide effective leadership as we evolve and mature sitereliability practices for the benefit of our cloud … applications and their customers. The successful candidate will be a leader of leaders with custodianship of application services across 5+ SRE teams. We're looking for an experienced professional whose technical background allows effective challenge and support of teams managing primarily Java based applications running in a dynamic IaaC AWS … outcomes in the pursuit of business, functional and personal goals. The successful application will lead by example, build strong and valuable relationships within the SRE org, wider tech and business stakeholders. They have the ability to face ambiguity and understand how to make sense of complexity, importantly being able to More ❯
our collective success? We seek a talented Engineering Lead to join our dynamic team and lead our SiteReliabilityEngineering (SRE) function. This role ensures our systems are reliable and scalable, directly impacting user satisfaction. By integrating SRE activities across teams, you'll foster collaboration and … alignment with senior management will keep us competitive and innovative, driving collective success. What will you do in this role? Oversee and manage the SREengineering team to ensure continuous improvement in reliability, scalability, ensuring conformance to our security standards. Lead the integration of SRE activities across Application … Computer Science, or a related field. Proven experience in a leadership role within an engineering team. Strong technical background with expertise in DevSecOps, SRE, Agile Excellent technical and organizational skills. Strong problem-solving abilities and attention to detail. What we prefer you to have: Effective communication and interpersonal skills. More ❯
Amazon Dedicated Cloud Engineer, Region ReliabilityEngineering & Automation Job ID: Amazon Development Center U.S., Inc. Are you passionate about creating resilient cloud systems that power mission-critical operations? Do you want to apply leading edge artificial intelligence technologies like Amazon Bedrock to challenging problems? Do you thrive on … engineering and maintaining the largest cloud infrastructure for some of the world's most complex environments? Amazon Web Services is seeking talented AWS Dedicated Cloud Engineers to join our Region ReliabilityEngineering & Automation (RRE&A) team. Our mission is to ensure the seamless operation of AWS's … dedicated cloud regions through proactive reliabilityengineering, automation, and leading-edge solutions. We seek individuals who bring a deep technical skill set in Development, Operations, Networking, and Systems Engineering, and who understand the Agile mindset and DevOps philosophies. We welcome engineers willing to think differently, redesigning systems More ❯
thrive. What You'll Do The Senior Director – Operations and ReliabilityEngineering is responsible for blending SiteReliabilityEngineering (SRE), DevOps, and traditional operations models to build a next-generation ReliabilityEngineering function. This role ensures end-to-end automation at scale, 24x7 … ensuring compliance with standardized frameworks and operational excellence. Key Responsibilities: Strategic Leadership & Transformation: * Define and execute a modern ReliabilityEngineering strategy, integrating SRE, DevOps, and automation-first operational models. * Drive end-to-end automation to eliminate toil, improve efficiency, and enhance operational resilience. * Lead the transition from traditional … Operational Excellence: * Mandate and assure the adoption of IT Service Management (ITSM) processes across all teams, ensuring standardized, efficient, and effective service delivery. * Establish SRE-based operational metrics, including SLOs, SLIs, and error budgets. * Oversee incident response, problem resolution, and root cause analysis with AI-driven remediation. * Ensure high availability More ❯
SiteReliabilityEngineering Manager (SRE), Analytics The Apple Services Engineering team (ASE) is one of the most exciting examples of Apple's long-held passion for combining art and technology. These are the people who power the App Store, Apple TV, Apple Music, Apple Podcasts, and … before, these teams remain small and multi-functional, offering greater exposure to the array of opportunities here. Description The Service ReliabilityEngineering (SRE) Manager role in Apple Services Engineering requires a mix of strategic engineering and design along with hands-on technical work. This SRE will … of users, then this is the place for you! Minimum Qualifications Experience with hiring and leading engineers Demonstrable success leading engineering teams - ideally SRE or Production Engineering Experience with large scale distributed systems Deep understanding and experience in one or more of the following: Hadoop, Spark, Flink, Kubernetes More ❯
systems while keeping levels of manual work low. SREs are expected to be experienced in software engineering principles, operational discipline, and automation. The SRE team works on a fully remote basis and works in conjunction with their US and Australian teams as well. This company are a market leader … Collaborate with product engineering teams to design/build fit-for-purpose and observable software. Required Skills and Experience: Proven experience in a SRE/DevOps/Platform Engineering role and having previously worked in a Software Engineering role in .Net and C#. Proficiency in C# development … development opportunities. Working with a team of caring, high-performing, and passionate people who have fun supporting our vision, innovation, and continuous improvement. This SRE/DevOps Engineer role is working for a market leading global software company and this job is part of a large program of change and More ❯
SiteReliability Engineer (SRE), Data Infrastructure The Apple Services Engineering team (ASE) is one of the most exciting examples of Apple's long-held passion for combining art and technology. These are the people who power the App Store, Apple TV, Apple Music, Apple Podcasts, and Apple … before, these teams remain small and multi-functional, offering greater exposure to the array of opportunities here. Description The SiteReliability Engineer (SRE) role in Apple Services Engineering requires a mix of strategic engineering and design along with hands-on, technical work. This SRE will configure … and dynamic organization. Minimum Qualifications BS or MS degree in Computer Science with years of experience in a SiteReliabilityEngineering (SRE) and/or DevOps role. Years of experience running services in a large scale nix environment and understanding of SRE principles & goals along with prior More ❯
Job Title: SiteReliabilityEngineering (SRE) Lead – Observability Location: Stratford, London (Hybrid – 2 days per week onsite) Contract Length: 6 months Rate: £450–£500 per day (Inside IR35) Industry: Financial Services A leading Financial Services organisation in London is seeking a SiteReliabilityEngineering (SRE) Lead – Observability to join their team on a 6-month contract. This is a hybrid role requiring two days per week onsite at their Stratford, London offices. The role sits Inside IR35 . Key Responsibilities: Lead the SRE Observability team and champion observability practices across multiple product groups. … creation and QA of project-level Observability Plans. Input into and assure the quality of testing strategies and results. Requirements Proven experience in an SRE role with a strong focus on Observability. Expert-level proficiency with DevOps tools including GitHub, GitHub Actions, Jenkins, Nexus, CloudFormation/Terraform, and CodeQL. Extensive More ❯
london, south east england, united kingdom Hybrid / WFH Options
MarkJames Search
Job Title: SiteReliabilityEngineering (SRE) Lead – Observability Location: Stratford, London (Hybrid – 2 days per week onsite) Contract Length: 6 months Rate: £450–£500 per day (Inside IR35) Industry: Financial Services A leading Financial Services organisation in London is seeking a SiteReliabilityEngineering (SRE) Lead – Observability to join their team on a 6-month contract. This is a hybrid role requiring two days per week onsite at their Stratford, London offices. The role sits Inside IR35 . Key Responsibilities: Lead the SRE Observability team and champion observability practices across multiple product groups. … creation and QA of project-level Observability Plans. Input into and assure the quality of testing strategies and results. Requirements Proven experience in an SRE role with a strong focus on Observability. Expert-level proficiency with DevOps tools including GitHub, GitHub Actions, Jenkins, Nexus, CloudFormation/Terraform, and CodeQL. Extensive More ❯
Are you a SiteReliability Engineer with experience in the iGaming and Gambling sector looking for an exciting new challenge? BENEFITS: Up to £95k depending on experience, fully remote, excellent benefits package. Join a rapidly growing company at the forefront of the iGaming industry, dedicated to delivering world … a leading brand, blending sports betting and online casino entertainment on a cutting-edge, custom-built platform. As a SiteReliability Engineer (SRE) , you will play a crucial role in designing, implementing, and maintaining scalable and reliable infrastructure. Working closely with development teams, you'll apply SRE principles … and resolving performance and availability issues. Manage and optimise containerised environments with Kubernetes , ensuring scalability and high availability. Collaborate with development teams to implement SRE best practices Implement strategies for Continuous Deployment to minimise release risks. Required Experience & Expertise Previous experience within the iGaming and Gambling sector. Strong experience with More ❯
Preferred Qualifications: Master's degree in Computer Science or Engineering, or a related field. About the Job SiteReliabilityEngineering (SRE) combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. SRE ensures that Google Cloud's services-both … our systems capacity and performance. Much of our software development focuses on optimizing existing systems, building infrastructure and eliminating work through automation. On the SRE team, you'll have the opportunity to manage the complex challenges of scale which are unique to Google Cloud, while using your expertise in coding … algorithms, complexity analysis and large-scale system design. SRE's culture of diversity, intellectual curiosity, problem solving and openness is key to its success. Our organization brings together people with a wide variety of backgrounds, experiences and perspectives. We encourage them to collaborate, think big and take risks in a More ❯
automate routine tasks. Systematic problem-solving approach, coupled with effective verbal and written communication skills. About the Job SiteReliabilityEngineering (SRE) combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. SRE ensures that Google Cloud's services-both … our systems capacity and performance. Much of our software development focuses on optimizing existing systems, building infrastructure and eliminating work through automation. On the SRE team, you'll have the opportunity to manage the complex challenges of scale which are unique to Google Cloud, while using your expertise in coding … algorithms, complexity analysis, and large-scale system design. SRE's culture of diversity, intellectual curiosity, problem solving, and openness is key to its success. Our organization brings together people with a wide variety of backgrounds, experiences, and perspectives. We encourage them to collaborate, think big, and take risks in a More ❯
SiteReliability Engineer (SRE), Services Media Growth London, England, United Kingdom Software and Services Add to Favorites SiteReliability Engineer (SRE), Services Media Growth Description The ASE Media Growth SRE team ensures the safety, reliability, scalability, and speed of applications and tooling. This demanding role … or MS degree in computer science or equivalent field. Familiarity with microservices architecture and container orchestration with Kubernetes and a solid understanding of core SRE concepts Automation advocate - you truly believe in removing operation load with software Experience with deploying, supporting and monitoring new and existing services, platforms, and application … networking topologies, traffic management strategies, failure modes, design of multi-datacenter systems, and wide-area networking. Add to Favorites SiteReliability Engineer (SRE), Services Media Growth More ❯
Manchester, Lancashire, United Kingdom Hybrid / WFH Options
bet365 Group
A SiteReliability Engineer, who will enhance system reliability, observability and performance through a strong engineering approach and assist with incident resolution and best practices. You will have software engineering skills, focusing on system reliability and observability. You will monitor the health, performance and … availability of critical systems, directly impacting operational efficiency. Using your engineering expertise, you will implement solutions that enhance reliability, including service instrumentation with tools such as Open Telemetry, improve logging practices and develop features for maintainability. You will also help engineer tools and automation for effective service management. … Collaboration is key, working across multiple functions to integrate reliability and observability best practices into the software development life cycle. By supporting governance standards set by the central teams, you will foster a culture where these principles are integral to development. Your contributions will ensure our systems meet user More ❯
Manchester Area, United Kingdom Hybrid / WFH Options
bet365
Who we are looking for A SiteReliability Engineer, who will enhance system reliability, observability and performance through a strong engineering approach and assist with incident resolution and best practices. You will have software engineering skills, focusing on system reliability and observability. You will … monitor the health, performance and availability of critical systems, directly impacting operational efficiency. Using your engineering expertise, you will implement solutions that enhance reliability, including service instrumentation with tools such as Open Telemetry, improve logging practices and develop features for maintainability. You will also help engineer tools and … automation for effective service management. Collaboration is key, working across multiple functions to integrate reliability and observability best practices into the software development life cycle. By supporting governance standards set by the central teams, you will foster a culture where these principles are integral to development. Your contributions will More ❯
Stoke-On-Trent, England, United Kingdom Hybrid / WFH Options
bet365
Who we are looking for A SiteReliability Engineer, who will enhance system reliability, observability and performance through a strong engineering approach and assist with incident resolution and best practices. You will have software engineering skills, focusing on system reliability and observability. You will … monitor the health, performance and availability of critical systems, directly impacting operational efficiency. Using your engineering expertise, you will implement solutions that enhance reliability, including service instrumentation with tools such as Open Telemetry, improve logging practices and develop features for maintainability. You will also help engineer tools and … automation for effective service management. Collaboration is key, working across multiple functions to integrate reliability and observability best practices into the software development life cycle. By supporting governance standards set by the central teams, you will foster a culture where these principles are integral to development. Your contributions will More ❯
Press Tab to Move to Skip to Content Link Select how often (in days) to receive an alert: SiteReliability Engineer (SRE) - Consultant - Digital Factory At Capgemini Invent, we believe difference drives change. As inventive transformation consultants, we blend our strategic, creative and scientific capabilities, collaborating closely with … science and data. Superpowered by creativity and design. All underpinned by technology created with purpose. YOUR ROLE As a SiteReliability Engineer (SRE), you will play a key role in ensuring the reliability, scalability, and efficiency of our clients' platforms. Your focus will include building strong observability … practices, aligning with the SRE mindset & principles, and driving continuous improvement. This will involve: Defining and implementing Service Level Indicators (SLIs) and Service Level Objectives (SLOs) to measure and maintain system and application performance, ensuring services meet agreed reliability targets. Instrumenting applications to collect key metrics, logs, and traces More ❯
us! Job Description: This job is responsible for partnering with engineering and technology teams to implement measures as prescribed by lead/senior SRE engineers. Key responsibilities include ensuring appropriate instrumentation, tooling, ticketing, alerting and on-call routines are in place for key services, identifying root causes of issues … reliability scripts, tools and libraries and leverages them for common instrumentation, automation, and operational needs, and when mentoring SiteReliability Engineer (SRE) resources on reliability practices and established tools/capabilities. Collaborates with Development and Infrastructure teams to understand technical solutions and implement monitoring capabilities outlined … in the application and system monitoring designs put forward by the SRE Lead. Partners to implement code changes to make use of common reliability libraries and tools and helps Application Production Services and Application Development teammates understand how to use them. Identifies vulnerabilities and opportunities for reliability improvement More ❯
Lead Cloud Infrastructure and SiteReliability Engineer Brand: HSBC Area of Interest: Technology Location: Birmingham, GB, B1 1HQ Work style: Office Worker Date: 24 Apr 2025 Join a digital-first bank that's powered by people. Our technology team builds innovative digital solutions rapidly and at scale to …/Infrastructure Security. Your work will provide assurance of the effectiveness of security controls to Business Risk Owners. The Lead Cybersecurity Analytics Cloud Infrastructure & SiteReliability Engineer will be part of the CSA Platform & Data Engineering Team, joining a global team of data technology professionals to deliver … Availability, Resiliency). To be successful in this role, you should meet the following requirements: Strong understanding of SiteReliabilityEngineering (SRE) principles and hands-on experience with Azure DevOps. Proficient in scripting (Bash, PowerShell, Azure CLI), coding (Python, C#, Java), and querying (SQL, Kusto Query Language More ❯
for a Head of SiteReliabilityEngineering to join our team to help us transform our existing operational workloads to an SRE approach. Key Responsibilities Establishing and managing our new SRE function Operating and modernising our existing cloud infrastructure Partnering with our DevOps team to ensure fast … levels Acting as a key Incident Commander and escalation point Liaising closely with our SecOps teams to ensure timely vulnerability management Educating teams in SRE practices and maintaining high standards of compliance Implementing world-class observability standards utilising SLI/SLO/Error Budgets Continually evolving our observability platforms for … greater coverage Liaising with Product & Engineering teams for constant evolution of metrics Aligning SRE Sprints & Backlog with our roadmaps to meet business expectations Guiding our teams in a more Agile approach to demand management Actively taking part in our daily stand-ups and keeping our Sprints on track Keeping More ❯
Chester, Cheshire West and Chester, Cheshire, United Kingdom
Ascendion
We are seeking a Platform Engineering Manager with a strong hands-on background in Java development and SiteReliabilityEngineering (SRE). The ideal candidate will have a broad technical skillset across Java, Spring, MuleSoft, Kafka, and Oracle DB, and must be capable of leading platform … develop resilient backend systems primarily using Java, Spring, Kafka, and Oracle. Implement best practices for observability, incident response, and operational excellence in line with SRE principles. Drive automation and self-healing mechanisms across platform components. Provide technical leadership and hands-on coding as needed. Monitor, troubleshoot, and resolve production issues … Java expertise with deep understanding of backend design patterns and frameworks (Spring Boot preferred). Proven experience in SiteReliabilityEngineering (SRE), including monitoring, alerting, and incident management. Hands-on experience with Kafka, MuleSoft, and Oracle DB. Familiarity with performance tuning, system design, and distributed computing concepts. More ❯
Select how often (in days) to receive an alert: We are seeking a highly skilled and proactive Oracle SiteReliability Engineer (SRE) to ensure the reliability, performance, and scalability of our critical Oracle-based applications and services supporting a global user base. The ideal candidate will possess … deep expertise in Oracle technologies and SRE methodologies. You will be responsible for ensuring the stability and efficiency of our Oracle systems, implementing automation, managing patching, and providing expert-level support to our global users. Strong cross-functional collaboration and a proactive approach to problem-solving are essential for success … troubleshoot IT systems. Leverage cloud platforms and automation tools to enhance scalability and efficiency. Ensure compliance with IT standards and regulations. Apply knowledge of SRE (SiteReliabilityEngineering) and/or DEVOPS practices to improve system reliability and performance. Maintain UK Security Clearance BPSS (Baseline Personnel More ❯