ensuring governance, security, compliance, and control. Experience Requirements: Proven experience in a senior SRE role or similar. Strong knowledge of cloud technologies and SLA SLO SLI management. Experience leading teams and implementing SCRUM processes. Excellent communication and leadership skills. Experience line managing, mentoring, and coaching. Responsibilities: Collaborate with the Principal More ❯
Cambridge, Gloucestershire, UK Hybrid / WFH Options
AI Tech Suite
ensuring governance, security, compliance, and control. Experience Requirements: Proven experience in a senior SRE role or similar. Strong knowledge of cloud technologies and SLA SLO SLI management. Experience leading teams and implementing SCRUM processes. Excellent communication and leadership skills. Experience line managing, mentoring, and coaching. Responsibilities: Collaborate with the Principal More ❯
ensuring governance, security, compliance, and control. Experience Requirements: Proven experience in a senior SRE role or similar. Strong knowledge of cloud technologies and SLA SLO SLI management. Experience leading teams and implementing SCRUM processes. Excellent communication and leadership skills. Experience line managing, mentoring, and coaching. Responsibilities: Collaborate with the Principal More ❯
Cambridge, Cambridgeshire, United Kingdom Hybrid / WFH Options
AI Tech Suite
ensuring governance, security, compliance, and control. Experience Requirements: Proven experience in a senior SRE role or similar. Strong knowledge of cloud technologies and SLA SLO SLI management. Experience leading teams and implementing SCRUM processes. Excellent communication and leadership skills. Experience line managing, mentoring, and coaching. Responsibilities: Collaborate with the Principal More ❯
Southampton, Hampshire, United Kingdom Hybrid / WFH Options
NICE
planning Create sustainable systems and services through automation and uplifts Balance feature development speed and reliability with well-defined servicelevelobjectives Have you got what it takes? 3-6 years of working experience in a similar role, with a focus More ❯
Warwick, Warwickshire, United Kingdom Hybrid / WFH Options
ICEO
Cooperate with engineering and product teams to design and implement highly available and fault-tolerant systems. Participate in improving ServiceLevelObjectives, ServiceLevel Indicators, and error budgets to enhance system reliability. Work More ❯
valued. What You'll Do Key responsibilities in this role will include (but not be limited to): Leveraging core SRE values - measuring (SLI/SLO/SLA), testing, and eliminating toil via automation with appropriate Disaster Recovery planning Refining KPIs to enable data-driven decision making for availability and reliability More ❯
deployment) of e.g. ELK, CloudWatch, Fluentd, to enable forensic log analysis and system tuning as well as data-driven performance analysis (i.e. SLI/SLO) and capacity planning. You are a competent Linux & Windows systems administrator (for multiple distributions), including storage management (e.g. LVM, RAID) and security best-practices e.g. More ❯
deployment) of e.g. ELK, CloudWatch, Fluentd, to enable forensic log analysis and system tuning as well as data-driven performance analysis (i.e. SLI/SLO) and capacity planning. You are a competent Linux & Windows systems administrator (for multiple distributions), including storage management (e.g. LVM, RAID) and security best-practices e.g. More ❯
City Of Bristol, England, United Kingdom Hybrid / WFH Options
Gravitas Recruitment Group (Global) Ltd
production environment and experience in automating/scripting. · Ability to quickly understand, update and write code in languages (ideally Java). · Working experience monitoring SLO’s, SLI’s and SLAs and logging updates. · Strong DevOps understanding and familiarity, including experience of Infrastructure as Code and CI/CD pipelines, e.g. More ❯
City of London, London, United Kingdom Hybrid / WFH Options
Sanderson Recruitment
As our Site Reliability Engineer, you'll work closely with our feature team and other colleagues to meet defined servicelevelobjectives and continually improve systems and environments. You'll define error budgets that support finding the right balance between risk More ❯
Manchester Area, United Kingdom Hybrid / WFH Options
bet365
Level Indicators (SLI) and ServiceLevelObjectives (SLO) for reliability and customer satisfaction. Knowledge of contemporary observability tools, techniques and best practice including Splunk, New Relic, Grafana and Pager Duty. Excellent knowledge of More ❯
Stoke-On-Trent, England, United Kingdom Hybrid / WFH Options
bet365
Level Indicators (SLI) and ServiceLevelObjectives (SLO) for reliability and customer satisfaction. Knowledge of contemporary observability tools, techniques and best practice including Splunk, New Relic, Grafana and Pager Duty. Excellent knowledge of More ❯
Manchester, Lancashire, United Kingdom Hybrid / WFH Options
bet365 Group
Level Indicators (SLI) and ServiceLevelObjectives (SLO) for reliability and customer satisfaction. Knowledge of contemporary observability tools, techniques and best practice including Splunk, New Relic, Grafana and Pager Duty. Knowledge and experience More ❯
bring the capabilities of groundbreaking AI technologies to benefit humanity in a safe and reliable way. Responsibilities: Develop appropriate ServiceLevelObjectives for large language model serving and training systems, balancing availability/latency with development velocity. Design and implement monitoring … at scale. Understand the unique challenges of operating AI infrastructure, including model serving, batch inference, and training pipelines. Have proven experience implementing and maintaining SLO/SLA frameworks for business-critical services. Are comfortable working with both traditional metrics (latency, availability) and AI-specific metrics (model performance, training convergence). More ❯