Cambridge, Cambridgeshire, United Kingdom Hybrid / WFH Options
AI Tech Suite
excellence Develop and implement strategic plans to enhance the reliability, scalability, and efficiency of our infrastructure Collaborate with cross-functional teams to align SRE initiatives with broader organizational goals Establish and maintain SLIs, SLOs, and SLAs for critical systems and services Drive the adoption of best practices in automation … and management solution that helps organizations harness AI's potential while ensuring governance, security, compliance, and control. Experience Requirements: Proven experience in a senior SRE role or similar. Strong knowledge of cloud technologies and SLA SLO SLI management. Experience leading teams and implementing SCRUM processes. Excellent communication and leadership skills. … Experience line managing, mentoring, and coaching. Responsibilities: Collaborate with the Principal SRE to shape and implement the SRE strategic plan. Lead the SRE team in translating strategy into actionable plans, coordinating these through the SCRUM process. Address wellbeing and performance concerns, fostering a positive and productive team environment. Work with More ❯
of Capital One's ambitions. We are keen to add a Senior SiteReliabilityEngineering Manager (SSREM) to our Nottingham based SRE organisation whose primary focus is to provide effective leadership as we evolve and mature sitereliability practices for the benefit of our cloud … applications and their customers. The successful candidate will be a leader of leaders with custodianship of application services across 5+ SRE teams. We're looking for an experienced professional whose technical background allows effective challenge and support of teams managing primarily Java based applications running in a dynamic IaaC AWS … outcomes in the pursuit of business, functional and personal goals. The successful application will lead by example, build strong and valuable relationships within the SRE org, wider tech and business stakeholders. They have the ability to face ambiguity and understand how to make sense of complexity, importantly being able to More ❯
thrive. What You'll Do The Senior Director – Operations and ReliabilityEngineering is responsible for blending SiteReliabilityEngineering (SRE), DevOps, and traditional operations models to build a next-generation ReliabilityEngineering function. This role ensures end-to-end automation at scale, 24x7 … ensuring compliance with standardized frameworks and operational excellence. Key Responsibilities: Strategic Leadership & Transformation: Define and execute a modern ReliabilityEngineering strategy, integrating SRE, DevOps, and automation-first operational models. Drive end-to-end automation to eliminate toil, improve efficiency, and enhance operational resilience. Lead the transition from traditional … Operational Excellence: Mandate and assure the adoption of IT Service Management (ITSM) processes across all teams, ensuring standardized, efficient, and effective service delivery. Establish SRE-based operational metrics, including SLOs, SLIs, and error budgets. Oversee incident response, problem resolution, and root cause analysis with AI-driven remediation. Ensure high availability More ❯
Manchester, Lancashire, United Kingdom Hybrid / WFH Options
bet365 Group
A SiteReliability Engineer, who will enhance system reliability, observability and performance through a strong engineering approach and assist with incident resolution and best practices. You will have software engineering skills, focusing on system reliability and observability. You will monitor the health, performance and … availability of critical systems, directly impacting operational efficiency. Using your engineering expertise, you will implement solutions that enhance reliability, including service instrumentation with tools such as Open Telemetry, improve logging practices and develop features for maintainability. You will also help engineer tools and automation for effective service management. … Collaboration is key, working across multiple functions to integrate reliability and observability best practices into the software development life cycle. By supporting governance standards set by the central teams, you will foster a culture where these principles are integral to development. Your contributions will ensure our systems meet user More ❯
Cambridge, Cambridgeshire, East Anglia, United Kingdom
RedTech Recruitment
game-changing technology within their industry, with exciting scope for expansion into further industries. This role is looking for someone to work within the SRE team responsible for incident response and issue resolution. Location: Cambridge Salary: £32,000 £60,000 + excellent benefits (£32,000 for a new Graduate) Requirements … of problem solving identifying the root causes of issues. Good logical reasoning Responsibilities for SiteReliability Engineer Graduate Considered: Working within the SRE team you will be responsible for the architecture of a mission-critical cloud platform for an industry-leading software company. You will be diagnosing issues … has been removed by the job-board, full details for contact are available on our website). Keywords- SiteReliability Engineer/SRE/DevOps/Software Engineering/Software Development/Engineering/Physics/Astrophysics/Python/Computer science/Cloud/Mathematics More ❯
Lead Cloud Infrastructure and SiteReliability Engineer Brand: HSBC Area of Interest: Technology Location: Birmingham, GB, B1 1HQ Work style: Office Worker Date: 24 Apr 2025 Join a digital-first bank that's powered by people. Our technology team builds innovative digital solutions rapidly and at scale to …/Infrastructure Security. Your work will provide assurance of the effectiveness of security controls to Business Risk Owners. The Lead Cybersecurity Analytics Cloud Infrastructure & SiteReliability Engineer will be part of the CSA Platform & Data Engineering Team, joining a global team of data technology professionals to deliver … Availability, Resiliency). To be successful in this role, you should meet the following requirements: Strong understanding of SiteReliabilityEngineering (SRE) principles and hands-on experience with Azure DevOps. Proficient in scripting (Bash, PowerShell, Azure CLI), coding (Python, C#, Java), and querying (SQL, Kusto Query Language More ❯
The SiteReliabilityEngineering (SRE) team at Pendo is responsible for provisioning and maintaining cloud infrastructure from development through production for all product initiatives, and working with developers and product managers to ensure that our products are not only reliable and performant, but also cost-efficient. Our … on-call and incident management functions, supporting a high-throughput platform which processes more than 15 billion events per day. To ensure the reliability of this environment for our customers, SREs work closely with developers and product managers to understand service level objectives, think through failures scenarios, and design … systems which balance cost with reliability objectives. Additionally, SREs collaborate with the Information Security team to ensure that cloud infrastructure is properly secured, and that sufficient controls are in place to meet our compliance goals with respect to industry standards such as SOC 2. Role Responsibilities Write high-quality More ❯
Manchester, Lancashire, United Kingdom Hybrid / WFH Options
Embarcaderomediagroup
SiteReliability & Platform Engineer to help lead the way. You'll sit at the heart of our engineering operations, bringing together SRE principles and modern platform engineering practices. This includes combining principles of SRE - such as service-level reliability, observability, incident response - with platform engineering … ship faster, safer, and more cost-efficiently. What you'll be doing: Designing and operating highly reliable, scalable, and secure Azure-based platforms Applying SRE principles like SLOs, observability, and incident management to drive service reliability Building Infrastructure as Code using Terraform (v1.7+) and GitOps workflows Enabling teams through … for someone passionate about building robust infrastructure and enabling others to move faster and more securely. You might come from a cloud engineering, SRE, or DevOps background - what matters most is your curiosity, systems thinking, and drive to improve operational efficiency. At Sorted, we are committed to fostering an More ❯
Dundee, Angus, United Kingdom Hybrid / WFH Options
Ivanti
offerings. We are responsible for the reliability, deployment, and operation of the Ivanti Cloud product portfolio. We are seeking individuals eager to drive SRE maturity through the research and development of internal tooling, operational enhancements, and deployment pipelines. Ivanti SRE takes a holistic view of operational procedures, incident response … procedures, application and infrastructure monitoring, and process automation. Ivanti SRE is a blend of infrastructure, networking, automation, development, and application administration. This is a hands-on technical position. The ideal candidate will have a software engineering background and strong experience with continuous deployment, SaaS delivery, and production incident response. … the company's growth trajectory through continuous innovation and customer-centric solutions. What You Will Be Doing Researching, maintaining, and contributing to automation of SRE tools and processes Contributing to solutions toward reducing toil within SRE Participating in code review and analysis with SRE peers Composing and reviewing contributions to More ❯
Lead SiteReliability Engineer Are you ready to take your career to the next level in a role that’s critical to the reliability, scalability, and performance of cutting-edge systems? We’re on the lookout for a Lead SiteReliability Engineer to bring innovation … Contribute to quality systems through deviation management, CAPA follow-up, and root cause investigations. What We’re Looking For: 5+ years of experience in SiteReliabilityEngineering or a related field. Hands-on experience with Biosafety and GMP environments. Strong foundation in Lean Six Sigma principles. Proven … problem-solving skills with a knack for performance tuning. Effective communicator and team player. Formal education in an engineering-related discipline. More ❯
also assist with CloudOps activities. Are you an experienced IT professional with a strong background in DevOps and SiteReliabilityEngineering (SRE)? Are you passionate about working with cutting-edge technologies, driving agile methodologies, and implementing CI/CD practices? Do you have knowledge of infrastructure as … code? Experience required: - Solid experience in a similar role, working on DevOps or SRE initiatives within complex IT environments with Software Engineering - AWS environment - Proficiency in DevOps practices and related technologies, such as CI/CD pipelines & infrastructure as code tools such as Terraform, Ansible, Puppet or Bicep. - Strong … to ensure system reliability and performance. - Any Linux experience would be a bonus Key Responsibilities: - Drive the strategy and implementation for DevOps and SRE practices. - Collaborate with cross-functional teams to design and implement CI/CD pipelines, ensuring efficient and reliable software delivery. - Establish and maintain best practices More ❯
Bradford, Yorkshire, United Kingdom Hybrid / WFH Options
Freemans Grattan Holdings (fgh)
our customer journey. Working collaboratively with a team of transformation experts you will have the flexibility to leverage your professional experience to solve computer engineering issues across a variety of technical areas, dependent on where your interests lie. Innovation is key as we look for new ideas which will … in a DevOps, or SiteReliabilityEngineering building high-traffic, high availability systems. Experience with sitereliabilityengineering (SRE) principles and monitoring tools, including New Relic. Experience in website performance monitoring and tuning using tools such as Lighthouse and the ability to troubleshoot performance More ❯
facilitate effective job matching and career development, not just for our users but also for our own team members. We are looking for a SiteReliability Engineer Lead to ensure our systems are reliable, scalable, and efficient. As the SiteReliability Engineer Lead, you will take … maintaining the health and performance of our platforms while also leading a talented team of engineers. You will champion and coach best practices in reliability and operational excellence to deliver an exceptional experience for our users. Key Responsibilities Minimising downtime to products & services and ensuring the platform is stable … availability and performance. Work with senior stakeholders to mature the concept of SiteReliability within the CVL organisation. Lead and mentor the SRE function, fostering a culture of collaboration, innovation, and excellence. Creating a bridge between Development and support teams by applying an ‘as-a-service' mindset to More ❯
Leeds, Yorkshire, United Kingdom Hybrid / WFH Options
Fruition Group
Job Title: Senior SiteReliability Engineer (SRE) Location: Leeds (Hybrid - c. 1-2 days per week) Salary: £60,000 - £80,000 + benefits Why Apply? This is a fantastic opportunity for a seasoned Senior SiteReliability Engineer to take a lead role in shaping the infrastructure … most innovative businesses in their market. Working with cutting-edge technology, this role offers high-impact challenges, meaningful collaboration, and excellent career progression. Senior SRE Responsibilities Manage and optimise cloud infrastructure to ensure scalability, high availability, and security. Design and implement robust CI/CD pipelines for efficient product delivery. … like GitlabCI, Terraform/OpenTofu, Ansible, and Scripting languages such as PowerShell or Python. Champion infrastructure best practices and mentor junior team members. Senior SRE Requirements Extensive experience in SRE or DevOps roles within high-availability, cloud-native environments. Strong expertise with AWS (including EKS, MSK, RDS, VPC design, encryption More ❯
Walker Cole International is supporting a leading global company in the life sciences sector in the recruitment of a Lead SiteReliability Engineer to join their team. This is a permanent role focused on ensuring the reliability, scalability, and performance of critical systems, including utilities and equipment … within a GMP-compliant environment. Key Responsibilities : Lead the design and implementation of scalable systems to enhance the performance and reliability of equipment and utilities .Manage incident response and implement monitoring solutions to ensure system uptime .Drive performance optimization and continuous improvement through Root Cause Analysis (RCA) and corrective … actions .Collaborate with cross-functional teams to ensure seamless integration and compliance with regulatory standards . Requiremen t s:Relevant industry experience in SiteReliabilityEngineering or a related fiel d.Experience within biosafety and GMP environments is desirabl e.Strong proficiency in Lean Six Sigma principles and technical More ❯
re Looking For: Basic Required Qualifications: Bachelor's degree in Computer Science, Information Technology, or a related field. 5+ years of experience as a SiteReliability Engineer or equivalent in a similar role. Proficient in application and infrastructure observability, Splunk OpenTelemetry preferred Experienced in production environments running in … troubleshooting and problem-solving skills with a knack for identifying and resolving complex technical issues Familiarity working in an Agile environment True understanding of SiteReliabilityEngineering Ability to build and maintain a system and culture that supports and implements SLOs. Familiar with Docker & Kubernetes, specifically EKS More ❯
SiteReliability Engineer (SRE) Head Resourcing is pleased to be working with one of the UK's leading retail banks who are looking for an experienced Cloud SRE to join our engineering team and help drive reliability, scalability, and automation across our cloud-based products and … services on Google Cloud Platform (GCP). This role is all about embedding SRE best practices, improving platform resilience, and troubleshooting service issues with an engineering-first approach, using code and automation rather than manual work. Key Skills & Experience: Background in software engineering or telemetry, ideally with SRE … per week onsite working in Bristol Why Join Us? We’re on a mission to transform our technology landscape, investing in automation, innovation, and engineering talent. If you want to help shape the future of cloud reliability, we’d love to hear from you. More ❯
Nottingham, Nottinghamshire, East Midlands, United Kingdom
Microlise
SiteReliability Engineer When registering to this job board you will be redirected to the online application form. Please ensure that … this is completed in full in order that your application can be reviewed. We are looking for an experienced SiteReliability Engineer (SRE) to join our Technical Operations team within Microlise. Your key responsibilities would include implementing and supporting the Microlise infrastructure. This will involve bringing automation and … TechOps experience, especially from an Infrastructure as Code approach Familiarity with development technologies like C# and SQL, Git In-depth knowledge and understanding of SRE practicesand infrastructure application monitoring frameworks Understanding of diverse monitoring requirements and tools An enthusiasm and ability to learn new technologies and approaches Excellent investigation and More ❯
SiteReliability Engineer When registering to this job board you will be redirected to the online application form. Please ensure that … this is completed in full in order that your application can be reviewed. We are looking for an experienced SiteReliability Engineer (SRE) to join our Technical Operations team within Microlise. Your key responsibilities would include implementing and supporting the Microlise infrastructure. This will involve bringing automation and … TechOps experience, especially from an Infrastructure as Code approach Familiarity with development technologies like C# and SQL, Git In-depth knowledge and understanding of SRE practicesand infrastructure application monitoring frameworks Understanding of diverse monitoring requirements and tools An enthusiasm and ability to learn new technologies and approaches Excellent investigation and More ❯
SiteReliability Engineer When registering to this job board you will be redirected to the online application form. Please ensure that … this is completed in full in order that your application can be reviewed. We are looking for an experienced SiteReliability Engineer (SRE) to join our Technical Operations team within Microlise. Your key responsibilities would include implementing and supporting the Microlise infrastructure. This will involve bringing automation and … TechOps experience, especially from an Infrastructure as Code approach Familiarity with development technologies like C# and SQL, Git In-depth knowledge and understanding of SRE practicesand infrastructure application monitoring frameworks Understanding of diverse monitoring requirements and tools An enthusiasm and ability to learn new technologies and approaches Excellent investigation and More ❯
program migrating services between Kubernetes environments. This position requires a strong blend of software engineering fundamentals and SiteReliabilityEngineering (SRE) principles, focusing on automation, reliability, and observability throughout the migration lifecycle. You will leverage your expertise in our cloud-native, Agile DevOps environment to … ensure a smooth and efficient transition, shaping the reliability and performance of our services. Coaching and mentoring others on best practices related to migration and reliability is a key part of this role. Key Responsibilities & Skills: Software Development & Adaptation: Design, build, test, and refactor software applications, specifically adapting … migration plans, technical designs, status updates, and risks to technical and product stakeholders. Collaborate effectively across teams and mentor engineers on software craft, Kubernetes, SRE principles, and migration techniques. More ❯
Birmingham, Staffordshire, United Kingdom Hybrid / WFH Options
N Consulting Limited
Role: SRE Lead Location: Birmingham, UK (Hybrid, 2-3 days WFO) Contract: 3 months (Possible extension ) Are you a skilled SiteReliability Engineer (SRE) with experience in maintaining scalable and reliable infrastructure? We're looking for a proactive leader with a passion for automation, incident management, and system … optimization. Key Skills Required: 5+ years of SRE or similar experience Expertise in Cloud Platforms (SIEM technologies preferred) Proficiency in Python or Bash scripting Hands-on experience with Infrastructure as Code (e.g., Terraform, Ansible) Familiarity with Docker and Kubernetes Strong problem-solving and collaboration skills Responsibilities: Design, implement, and manage More ❯
ability to leverage data, knowledge, and prediction to find new medicines. We are a full-stack shop consisting of product and portfolio leadership, data engineering, infrastructure and DevOps, data/metadata/knowledge platforms, and AI/ML and analysis platforms, all geared toward: Building a next-generation data … data mechanics" Providing best-in-class AI/ML and data analysis environments to accelerate our predictive capabilities and attract top-tier talent Aggressively engineering our data at scale to unlock the value of our combined data assets and predictions in real-time Onyx Product Management is at the … the product strategy of our DevOps and Infrastructure platforms to meet the customer needs. You will partner closely with the leaders of Onyx's engineering teams (DevOps and Infrastructure, AI/ML analysis and computing platform, data & knowledge platform, data engineering, UI/UX engineering), along with More ❯
Warwick, Warwickshire, United Kingdom Hybrid / WFH Options
ICEO
charge of ensuring our data-intensive infrastructure is robust, secure, scalable, and optimized for exceptional performance, delivering best experiences for our customers. As an SRE, you'll champion best practices across teams, shaping the future of our technological landscape. Help us build an innovative platform that enables seamless, real-time … freedom, security, and efficiency, whether for personal finances, business operations, or global investments. In this role, you will: Participate in defining and leading the SRE vision and strategy, ensuring alignment with business objectives and engineering priorities. Architect, maintain, and develop infrastructure within GCP and GKE - on high and low … applicable frameworks and regulations (DORA, SOC 2, ISO 27001, GDPR). Create documentation from the implemented solutions. Influence and mentor engineering teams on SRE principles, DevOps culture, and best practices. Keep up with industry trends, leveraging new tools, frameworks, and methodologies to consistently enhance system reliability. Care for keeping More ❯
london, south east england, United Kingdom Hybrid / WFH Options
RP International
SiteReliability Engineer | Inside IR35 | Hybrid - 2 Days Onsite London | 6 Month Contract Our client a multinational and respected consultancy is hiring for a Lead SiteReliability Engineer with expertise in AWS and DevOps Tools for a new project in the Public Sector. Technical Skills/ More ❯