excellence Develop and implement strategic plans to enhance the reliability, scalability, and efficiency of our infrastructure Collaborate with cross-functional teams to align SRE initiatives with broader organizational goals Establish and maintain SLIs, SLOs, and SLAs for critical systems and services Drive the adoption of best practices in automation … and management solution that helps organizations harness AI's potential while ensuring governance, security, compliance, and control. Experience Requirements: Proven experience in a senior SRE role or similar. Strong knowledge of cloud technologies and SLA SLO SLI management. Experience leading teams and implementing SCRUM processes. Excellent communication and leadership skills. … Experience line managing, mentoring, and coaching. Responsibilities: Collaborate with the Principal SRE to shape and implement the SRE strategic plan. Lead the SRE team in translating strategy into actionable plans, coordinating these through the SCRUM process. Address wellbeing and performance concerns, fostering a positive and productive team environment. Work with More ❯
Cambridge, Cambridgeshire, United Kingdom Hybrid / WFH Options
AI Tech Suite
excellence Develop and implement strategic plans to enhance the reliability, scalability, and efficiency of our infrastructure Collaborate with cross-functional teams to align SRE initiatives with broader organizational goals Establish and maintain SLIs, SLOs, and SLAs for critical systems and services Drive the adoption of best practices in automation … and management solution that helps organizations harness AI's potential while ensuring governance, security, compliance, and control. Experience Requirements: Proven experience in a senior SRE role or similar. Strong knowledge of cloud technologies and SLA SLO SLI management. Experience leading teams and implementing SCRUM processes. Excellent communication and leadership skills. … Experience line managing, mentoring, and coaching. Responsibilities: Collaborate with the Principal SRE to shape and implement the SRE strategic plan. Lead the SRE team in translating strategy into actionable plans, coordinating these through the SCRUM process. Address wellbeing and performance concerns, fostering a positive and productive team environment. Work with More ❯
SiteReliabilityEngineer (SRE) Hybrid working Who are we? Toyota Connected Europe aims to create a better world through connected mobility for all. We are a new company focused on integrating big data and a customer-centric approach into all aspects of the mobility experience to make … growth and scalability. We aim to enhance agility, effectiveness, and innovation, collaborating with product teams to align on technological and project goals. As a SiteReliabilityEngineer, you will manage and improve complex cloud operations for one of the world's largest automotive companies. You will work … vehicle solutions. This environment values passion and potential; we are committed to developing talent into superstars. What you will do: Ensure the availability, performance, reliability, and scalability of applications and services. Collaborate with Software Engineering to define infrastructure and deployment requirements. Proactively identify and resolve production issues, developing tools More ❯
up, we want like-minded humans to join us on this exciting journey. Are you ready? As a SiteReliabilityEngineer (SRE), you will play an important role in designing, building, and maintaining the infrastructure and tools necessary to support our software applications and services. You will … collaborate closely with the product engineering squads, technical operations, and security teams to ensure the reliability, scalability, and security of our platform. Your responsibilities will include automating infrastructure provisioning, configuration management, and deployment pipelines, utilizing best practices and modern technologies to streamline processes and improve efficiency. You will also … be responsible for monitoring system performance, identifying bottlenecks, and implementing solutions to enhance system reliability and performance. Key Responsibilities Cloud Platform Management: Using Azure/AWS to manage and optimize infrastructure components, ensuring scalability, reliability, and cost management. Infrastructure Design and Implementation: Designing, building and maintaining the cloud More ❯
City of London, London, United Kingdom Hybrid / WFH Options
Sanderson Recruitment
Role: SiteReliabilityEngineer Location: London (Hybrid) Salary: £80,000 - £105,000 As our SiteReliabilityEngineer, you'll work closely with our feature team and other colleagues to meet defined service level objectives and continually improve systems and environments. You'll define error … Candidate: Very strong engineering skills in Java,JavaScript or Python Open Telemetry experience Must have Core Java/Python Must have experience as an SRE knowledge of Python Data Structures Strong knowledge of deploy and release services, automation and troubleshooting Experience of utilising tools and technology across the software development More ❯
Reading, England, United Kingdom Hybrid / WFH Options
People Source Consulting trading as Experis
SiteReliabilityEngineer - DevOps Engineer 18 Month Contract PAYE - Fully Remote/or Hybrid based in Midlands if preferred. The role We are working with one of the finest gaming studios in the industry and are on the lookout for an … exceptional SiteReliabilityEngineer who can bring their expertise and unique thinking to help make their team even stronger! As an SRE the main purpose is solving for scale through collaboration and automation, bringing engineering principles to infrastructure and operational problems. Work closely with the different teams More ❯
Manchester, Lancashire, United Kingdom Hybrid / WFH Options
bet365 Group
A SiteReliabilityEngineer, who will enhance system reliability, observability and performance through a strong engineering approach and assist with incident resolution and best practices. You will have software engineering skills, focusing on system reliability and observability. You will monitor the health, performance and availability … of critical systems, directly impacting operational efficiency. Using your engineering expertise, you will implement solutions that enhance reliability, including service instrumentation with tools such as Open Telemetry, improve logging practices and develop features for maintainability. You will also help engineer tools and automation for effective service management. Collaboration … is key, working across multiple functions to integrate reliability and observability best practices into the software development life cycle. By supporting governance standards set by the central teams, you will foster a culture where these principles are integral to development. Your contributions will ensure our systems meet user demands More ❯
nurture others and learn from them, then this is your challenge! The Team The Infrastructure as a Service (IaaS) team aims at upholding the reliability and scalability we expect from Algolia's infrastructure for its critical systems and products. Our focus is on enabling teams across Algolia to leverage … this infrastructure while keeping it under control through an always increasing level of automation. The Opportunity The Senior SiteReliabilityEngineer position within the IaaS team provides a dynamic opportunity for a professional with foundational experience in maintaining and optimising scalable infrastructures. This role specifically concentrates … on three key areas: Server and container hosting, cloud and network expertise and flawless observability. As a Senior SiteReliabilityEngineer (SRE) , you will play a pivotal role in designing, implementing, and maintaining highly available, scalable, and fault-tolerant systems. Your work will directly impact the effectiveness More ❯
SiteReliabilityEngineer - DevOps Engineer 18 Month Contract PAYE - Fully Remote/or Hybrid based in Midlands if preferred. The role We are working with one of the finest gaming studios in the industry and are on the lookout for an … exceptional SiteReliabilityEngineer who can bring their expertise and unique thinking to help make their team even stronger! As an SRE the main purpose is solving for scale through collaboration and automation, bringing engineering principles to infrastructure and operational problems. Work closely with the different teams More ❯
SiteReliabilityEngineer - DevOps Engineer 18 Month Contract PAYE - Fully Remote/or Hybrid based in Midlands if preferred. The role We are working with one of the finest gaming studios in the industry and are on the lookout for an … exceptional SiteReliabilityEngineer who can bring their expertise and unique thinking to help make their team even stronger! As an SRE , the main purpose is solving for scale through collaboration and automation, bringing engineering principles to infrastructure and operational problems. Work closely with the different teams More ❯
Manchester Area, United Kingdom Hybrid / WFH Options
bet365
Who we are looking for A SiteReliabilityEngineer, who will enhance system reliability, observability and performance through a strong engineering approach and assist with incident resolution and best practices. You will have software engineering skills, focusing on system reliability and observability. You will monitor … the health, performance and availability of critical systems, directly impacting operational efficiency. Using your engineering expertise, you will implement solutions that enhance reliability, including service instrumentation with tools such as Open Telemetry, improve logging practices and develop features for maintainability. You will also help engineer tools and automation … for effective service management. Collaboration is key, working across multiple functions to integrate reliability and observability best practices into the software development life cycle. By supporting governance standards set by the central teams, you will foster a culture where these principles are integral to development. Your contributions will ensure More ❯
Stoke-On-Trent, England, United Kingdom Hybrid / WFH Options
bet365
Who we are looking for A SiteReliabilityEngineer, who will enhance system reliability, observability and performance through a strong engineering approach and assist with incident resolution and best practices. You will have software engineering skills, focusing on system reliability and observability. You will monitor … the health, performance and availability of critical systems, directly impacting operational efficiency. Using your engineering expertise, you will implement solutions that enhance reliability, including service instrumentation with tools such as Open Telemetry, improve logging practices and develop features for maintainability. You will also help engineer tools and automation … for effective service management. Collaboration is key, working across multiple functions to integrate reliability and observability best practices into the software development life cycle. By supporting governance standards set by the central teams, you will foster a culture where these principles are integral to development. Your contributions will ensure More ❯
Birmingham, Staffordshire, United Kingdom Hybrid / WFH Options
N Consulting Limited
Role: SRE Lead Location: Birmingham, UK (Hybrid, 2-3 days WFO) Contract: 3 months (Possible extension ) Are you a skilled SiteReliabilityEngineer (SRE) with experience in maintaining scalable and reliable infrastructure? We're looking for a proactive leader with a passion for automation, incident management, and … system optimization. Key Skills Required: 5+ years of SRE or similar experience Expertise in Cloud Platforms (SIEM technologies preferred) Proficiency in Python or Bash scripting Hands-on experience with Infrastructure as Code (e.g., Terraform, Ansible) Familiarity with Docker and Kubernetes Strong problem-solving and collaboration skills Responsibilities: Design, implement, and More ❯
SiteReliabilityEngineer Remote - Canada, Americas/Engineering We offer The Tyk API Management platform is helping to drive the connected world and power new products and services. We're changing the way that organisations connect any number of their systems and services.Whether internal, external, public or … like an environment that you believe could work for you then read on to find out more. The role: We're looking for a SiteReliabilityEngineer to manage, maintain, improve and provide support on our platform. You will be curious by nature, always looking for ways … be advocate of continuous improvement Reliability of our new global Tyk Cloud platform Automation of operations and support Writing and maintaining documentation on SRE processes and policies Recommending and implementing ways of driving operational efficiency and driving down our cost to run, without impacting service Assisting in penetration testing More ❯
Location: Hybrid - 20% in the office per month Nominet is on the hunt for a skilled SiteReliabilityEngineer to be a part of our Reliability Engineering function. This team is dedicated to the creation and upkeep of our secure compute platforms, a foundational element of … unwavering commitment to strict security and compliance protocols, you can expect to encounter a myriad of challenging problems to address. The role of the SiteReliabilityEngineer is vital to Nominet; this role encompasses the design, rollout, and administration of scalable cloud infrastructure, primarily on AWS. The … or tools to further enhance automation, orchestration, and developer experience. About you and your experience Technical A suitable candidate would ideally have experience in SRE, platform engineering, DevOps, or a cloud engineering role. AWS: Experience operating production systems on AWS. Holding relevant AWS certifications (like AWS Certified Solutions Architect or More ❯
City Of Bristol, England, United Kingdom Hybrid / WFH Options
Gravitas Recruitment Group (Global) Ltd
products and services within the GCP platform. Meaning the next generation of services that form this Financial Services companies vision for 2025! Role - Lead SiteReliabilityEngineer Salary - £90,440 - £106,400 Location - London – Hybrid/Flexible working. Essential Skills: · Experience working with GCP products (or extensive … Jenkins, or alternatives such as Azure DevOps; You will report partner with service teams to drive the adoption of SiteReliability Engineering (SRE) best practices, ensuring these principles are integrated effectively within our microservices. Collaborate with infrastructure engineers to guarantee the resilience, scalability, and overall performance of the More ❯
time data, set us apart as the leader in payments. We're on the hunt for an exceptional SiteReliabilityEngineer (SRE) to join our dedicated team. As an SRE at Paymentology, you'll be the superhero responsible for maintaining, improving, and ensuring the high availability, scalability … and service quality levels. Contribute to the design of reliable cloud infrastructure and implement reusable cloud-uptime components as code. Regularly review and optimise SRE practices, tools, and methodologies to enhance overall system reliability and team efficiency. Observability and Automation: Contribute to the design, implementation, and maintenance of observability … a culture of reliability. Requirements Bachelor's Degree in Computer Science, Information Technology, or related field. A minimum of 3 years in a dedicated SRE role, as well as 5+ years of prior software development experience. Comprehensive understanding of large-scale distributed platform architecture. Extensive hands-on cloud experience, particularly More ❯
We are seeking talented Senior SiteReliability Engineers to join our growing SRE team! You will tackle complex challenges by designing and implementing scalable, reliable infrastructure and services that power the future of customer engagement technology. In this pivotal role, you'll leverage your extensive expertise in backend … systems and infrastructure management to enhance the performance and reliability of our platforms. Your contributions will directly influence the shaping of architecture and operational excellence needed for our product to thrive. Some things you'll do Architect and maintain critical infrastructure to enable Customer.io to scale and handle real … processing of billions of messages. Strategically plan and implement infrastructure growth to meet evolving demands and repeatability. Streamline and automate processes for efficiency and reliability, removing manual toil. Participate in on-call rotations to swiftly address availability incidents and support technical engineers with customer-related issues. Develop observability to More ❯
solutions that simplify the way IT organizations work. We are currently looking for a Senior SiteReliabilityEngineer to join our SRE team in the Platform Engineering organization and help us scale our products to millions of end-users. We are looking for individuals with a passion … improve efficiency and reduce delivery time of applications and infrastructure Other duties as needed About You 7+ years' experience in DevOps and/or SiteReliability Engineering roles 3+ years' experience with an object-oriented language (preferably Java, .NET or C++) Expert+ level Linux administration, scripting, and troubleshooting More ❯
solutions that simplify the way IT organizations work. We are currently looking for a Senior SiteReliabilityEngineer to join our SRE team in the Platform Engineering organization and help us scale our products to millions of end-users. We are looking for individuals with a passion … improve efficiency and reduce delivery time of applications and infrastructure Other duties as needed About You 7+ years' experience in DevOps and/or SiteReliability Engineering roles 3+ years' experience with an object-oriented language (preferably Java, .NET or C++) Expert+ level Linux administration, scripting, and troubleshooting More ❯
role: We are looking for a highly capable and experienced SiteReliabilityEngineer to join our growing tech team. As an SRE you will be a hands-on coach for the development teams maintaining and improving our solutions' reliability. You will be part of our DevOps team … and alerting platforms, such as ELK, DataDog, Grafana, Loki, etc. Solid understanding of monitoring and alerting best practices. Previous experience as DevOps/Platform Engineer or SRE. Expertise with IaC tooling (Terraform) and good understanding of cloud technologies, ideally Azure. Hands-on expertise with Kubernetes and Helm. Fundamental understanding More ❯
Warwick, Warwickshire, United Kingdom Hybrid / WFH Options
ICEO
charge of ensuring our data-intensive infrastructure is robust, secure, scalable, and optimized for exceptional performance, delivering best experiences for our customers. As an SRE, you'll champion best practices across teams, shaping the future of our technological landscape. Help us build an innovative platform that enables seamless, real-time … freedom, security, and efficiency, whether for personal finances, business operations, or global investments. In this role, you will: Participate in defining and leading the SRE vision and strategy, ensuring alignment with business objectives and engineering priorities. Architect, maintain, and develop infrastructure within GCP and GKE - on high and low-level … with applicable frameworks and regulations (DORA, SOC 2, ISO 27001, GDPR). Create documentation from the implemented solutions. Influence and mentor engineering teams on SRE principles, DevOps culture, and best practices. Keep up with industry trends, leveraging new tools, frameworks, and methodologies to consistently enhance system reliability. Care for keeping More ❯
as well as ensuring the platform is performant and reliable. You will be a key member of the team, liaising with product teams, embedding SRE principles and building the observability platform for the next stage of growth at GSS. You will have direct input into the direction of Technical Operations … culture where your ideas are valued. What You'll Do Key responsibilities in this role will include (but not be limited to): Leveraging core SRE values - measuring (SLI/SLO/SLA), testing, and eliminating toil via automation with appropriate Disaster Recovery planning Refining KPIs to enable data-driven decision … preferably event-driven) Be a self-starter that relishes responsibility. Take strategic direction and own end to end delivery of solutions. Expert knowledge of SRE fundamentals and a commitment to best practice Fluency with common observability tooling like Prometheus, Grafana, OTEL and Cloudwatch Experience analysing and building data telemetry, querying More ❯
SiteReliabilityEngineer – FinTech/Global Payments – London HQ/Remote First Salary - £80,000/£85,000 + Bonus Location - This UK-based team offers a fully remote working option, with a headquarters in Central London. In this role, you will be joining a leading SaaS … market. The business aims to scale its platform significantly over the next few years to support a growing international client base. Responsibilities Champion core SRE practices: define SLIs/SLOs/SLAs, reduce toil through automation, and plan for Disaster Recovery. Refine KPIs to support data-driven decisions around reliability … teams to build resilient, observable, and maintainable features. Promote DevOps culture by leading knowledge-sharing sessions and supporting issue resolution. Skills Strong grounding in SRE principles and operational best practices. Proficient with observability tools (Prometheus, Grafana, OTEL, Cloudwatch) and telemetry pipelines. Solid programming skills in Python and/or Go More ❯
SiteReliabilityEngineer – FinTech/Global Payments – London HQ/Remote First Salary - £80,000/£85,000 + Bonus Location - This UK-based team offers a fully remote working option, with a headquarters in Central London. In this role, you will be joining a leading SaaS … market. The business aims to scale its platform significantly over the next few years to support a growing international client base. Responsibilities Champion core SRE practices: define SLIs/SLOs/SLAs, reduce toil through automation, and plan for Disaster Recovery. Refine KPIs to support data-driven decisions around reliability … teams to build resilient, observable, and maintainable features. Promote DevOps culture by leading knowledge-sharing sessions and supporting issue resolution. Skills Strong grounding in SRE principles and operational best practices. Proficient with observability tools (Prometheus, Grafana, OTEL, Cloudwatch) and telemetry pipelines. Solid programming skills in Python and/or Go More ❯