ensuring governance, security, compliance, and control. Experience Requirements: Proven experience in a senior SRE role or similar. Strong knowledge of cloud technologies and SLA SLO SLI management. Experience leading teams and implementing SCRUM processes. Excellent communication and leadership skills. Experience line managing, mentoring, and coaching. Responsibilities: Collaborate with the Principal More ❯
Cambridge, Cambridgeshire, United Kingdom Hybrid / WFH Options
AI Tech Suite
ensuring governance, security, compliance, and control. Experience Requirements: Proven experience in a senior SRE role or similar. Strong knowledge of cloud technologies and SLA SLO SLI management. Experience leading teams and implementing SCRUM processes. Excellent communication and leadership skills. Experience line managing, mentoring, and coaching. Responsibilities: Collaborate with the Principal More ❯
Southampton, Hampshire, United Kingdom Hybrid / WFH Options
NICE
planning Create sustainable systems and services through automation and uplifts Balance feature development speed and reliability with well-defined servicelevelobjectives Have you got what it takes? 3-6 years of working experience in a similar role, with a focus More ❯
Warwick, Warwickshire, United Kingdom Hybrid / WFH Options
ICEO
Cooperate with engineering and product teams to design and implement highly available and fault-tolerant systems. Participate in improving ServiceLevelObjectives, ServiceLevel Indicators, and error budgets to enhance system reliability. Work More ❯
valued. What You'll Do Key responsibilities in this role will include (but not be limited to): Leveraging core SRE values - measuring (SLI/SLO/SLA), testing, and eliminating toil via automation with appropriate Disaster Recovery planning Refining KPIs to enable data-driven decision making for availability and reliability More ❯
/CD pipelines, Infrastructure as Code, and automation frameworks tailored to our systems Drive disaster recovery planning, high availability architecture, and 24/7 SLO adherence for critical ad-serving solutions Build and maintain custom, complex deployment pipelines using Jenkins and other modern tools Improve system reliability and developer productivity More ❯
deployment) of e.g. ELK, CloudWatch, Fluentd, to enable forensic log analysis and system tuning as well as data-driven performance analysis (i.e. SLI/SLO) and capacity planning. You are a competent Linux & Windows systems administrator (for multiple distributions), including storage management (e.g. LVM, RAID) and security best-practices e.g. More ❯
SR2 | Socially Responsible Recruitment | Certified B Corporation™
and experience in automating/scripting. Understand and write code in multiple languages such as Python, Java, Golang, BASH and PowerShell. Experience in monitoring SLO’s, SLI’s and SLA’s a logging updates and altering where appropriate. Perks and Benefits Up to £105k (DoE) 2 days a week onsite More ❯
with customer experience, scalability, and performance in mind. Analyze system performance and reliability, offering recommendations for enhancement. Develop and uphold service-levelobjectives (SLOs), service-level indicators (SLIs), and error budgets for our More ❯
City Of Bristol, England, United Kingdom Hybrid / WFH Options
Gravitas Recruitment Group (Global) Ltd
production environment and experience in automating/scripting. · Ability to quickly understand, update and write code in languages (ideally Java). · Working experience monitoring SLO’s, SLI’s and SLAs and logging updates. · Strong DevOps understanding and familiarity, including experience of Infrastructure as Code and CI/CD pipelines, e.g. More ❯
ensure the reliability of this environment for our customers, SREs work closely with developers and product managers to understand servicelevelobjectives, think through failures scenarios, and design systems which balance cost with reliability objectives. Additionally, SREs collaborate with the Information More ❯
Expertise in defining and monitoring service quality metrics (such as RED, Golden Signals), establishing microservice ServiceLevelObjectives (SLOs), and managing error budgets. Proficiency in Linux, cloud networking, microservices architecture, and Amazon EKS. Preferred qualifications include: Prior More ❯
Expertise in defining and monitoring service quality metrics (such as RED, Golden Signals), establishing microservice ServiceLevelObjectives (SLOs), and managing error budgets. Proficiency in Linux, cloud networking, microservices architecture, and Amazon EKS. Preferred qualifications include: Prior More ❯
and training engineers up to Staff standard. Operational Stability: Demonstrate a production first attitude, continuously considering observability and maintaining ServiceLevelObjectives, while delivering change at pace. Research & Innovation: Embrace emerging technologies and trends, and share insights with the organisation, while More ❯
This will involve: Defining and implementing ServiceLevel Indicators (SLIs) and ServiceLevelObjectives (SLOs) to measure and maintain system and application performance, ensuring services meet agreed reliability targets. Instrumenting applications to collect … principles, including the creation and management of ServiceLevel Indicators (SLIs), ServiceLevelObjectives (SLOs) and error budgets ensuring reliability and performance. Experience in implementing observability, instrumenting applications to provide insights into system More ❯
City of London, London, United Kingdom Hybrid / WFH Options
Sanderson Recruitment
As our Site Reliability Engineer, you'll work closely with our feature team and other colleagues to meet defined servicelevelobjectives and continually improve systems and environments. You'll define error budgets that support finding the right balance between risk More ❯
systems and third-party solutions. Network Health Management: Define and implement prediction pipelines for long-term network health, availability, and service-level objectives. Operations Automation: Lead initiatives to automate and optimize network operations focusing on scalability and reliability. Collaborative Development: Work closely More ❯
to reduce failures, manual tasks and therefore improving overall application performance and availability. As well as responding to stakeholder requests within agreed timescales or SLO, they will also be supporting maintenance activities, critical systems, and the planning of releases related to production applications. This is an opportunity to join an More ❯
to reduce failures, manual tasks and therefore improving overall application performance and availability. As well as responding to stakeholder requests within agreed timescales or SLO, they will also be supporting maintenance activities, critical systems, and the planning of releases related to production applications. This is an opportunity to join an More ❯
Create pro-active monitoring and observability solutions to help us see issues before our customers do Define and measure ServiceLevelObjectives and ServiceLevel Indicators Why Lloyds Banking Group We're on … workplaces, and colleagues to make our Group a great place for everyone. Including you! What you'll need Strong practitioner in SRE principles (SLI, SLO & SLA) using Observability, Logging, Monitoring & Alerting Experience of Infrastructure as Code and CI/CD pipelines using tools such as Terraform, Jenkins and Harness Can More ❯
quality. The Service Delivery Manager will be responsible for ensuring our technical teams meet their servicelevelobjectives, driving operational excellence, and maintaining strong relationships with internal and external stakeholders. You will play a vital part in More ❯
quality. The Service Delivery Manager will be responsible for ensuring our technical teams meet their servicelevelobjectives, driving operational excellence, and maintaining strong relationships with internal and external stakeholders. You will play a vital part in More ❯
and EMEA time zones Preferred (Bonus) Skills Hands-on experience with tools like PagerDuty, OpsGenie, ServiceNow, CloudWatch, Chronosphere, or similar Understanding of SLA/SLO implementation and performance tracking Exposure to incident management frameworks, automated remediation, and runbook automation Background in DevOps or SRE culture and tooling Prior people leadership More ❯
and EMEA time zones Preferred (Bonus) Skills Hands-on experience with tools like PagerDuty, OpsGenie, ServiceNow, CloudWatch, Chronosphere, or similar Understanding of SLA/SLO implementation and performance tracking Exposure to incident management frameworks, automated remediation, and runbook automation Background in DevOps or SRE culture and tooling Prior people leadership More ❯
and EMEA time zones Preferred (Bonus) Skills Hands-on experience with tools like PagerDuty, OpsGenie, ServiceNow, CloudWatch, Chronosphere, or similar Understanding of SLA/SLO implementation and performance tracking Exposure to incident management frameworks, automated remediation, and runbook automation Background in DevOps or SRE culture and tooling Prior people leadership More ❯