ensuring governance, security, compliance, and control. Experience Requirements: Proven experience in a senior SRE role or similar. Strong knowledge of cloud technologies and SLA SLO SLI management. Experience leading teams and implementing SCRUM processes. Excellent communication and leadership skills. Experience line managing, mentoring, and coaching. Responsibilities: Collaborate with the Principal More ❯
Cambridge, Gloucestershire, UK Hybrid / WFH Options
AI Tech Suite
ensuring governance, security, compliance, and control. Experience Requirements: Proven experience in a senior SRE role or similar. Strong knowledge of cloud technologies and SLA SLO SLI management. Experience leading teams and implementing SCRUM processes. Excellent communication and leadership skills. Experience line managing, mentoring, and coaching. Responsibilities: Collaborate with the Principal More ❯
ensuring governance, security, compliance, and control. Experience Requirements: Proven experience in a senior SRE role or similar. Strong knowledge of cloud technologies and SLA SLO SLI management. Experience leading teams and implementing SCRUM processes. Excellent communication and leadership skills. Experience line managing, mentoring, and coaching. Responsibilities: Collaborate with the Principal More ❯
Cambridge, Cambridgeshire, United Kingdom Hybrid / WFH Options
AI Tech Suite
ensuring governance, security, compliance, and control. Experience Requirements: Proven experience in a senior SRE role or similar. Strong knowledge of cloud technologies and SLA SLO SLI management. Experience leading teams and implementing SCRUM processes. Excellent communication and leadership skills. Experience line managing, mentoring, and coaching. Responsibilities: Collaborate with the Principal More ❯
Southampton, Hampshire, United Kingdom Hybrid / WFH Options
NICE
planning Create sustainable systems and services through automation and uplifts Balance feature development speed and reliability with well-defined servicelevelobjectives Have you got what it takes? 3-6 years of working experience in a similar role, with a focus More ❯
Warwick, Warwickshire, United Kingdom Hybrid / WFH Options
ICEO
Cooperate with engineering and product teams to design and implement highly available and fault-tolerant systems. Participate in improving ServiceLevelObjectives, ServiceLevel Indicators, and error budgets to enhance system reliability. Work More ❯
valued. What You'll Do Key responsibilities in this role will include (but not be limited to): Leveraging core SRE values - measuring (SLI/SLO/SLA), testing, and eliminating toil via automation with appropriate Disaster Recovery planning Refining KPIs to enable data-driven decision making for availability and reliability More ❯
/CD pipelines, Infrastructure as Code, and automation frameworks tailored to our systems Drive disaster recovery planning, high availability architecture, and 24/7 SLO adherence for critical ad-serving solutions Build and maintain custom, complex deployment pipelines using Jenkins and other modern tools Improve system reliability and developer productivity More ❯
deployment) of e.g. ELK, CloudWatch, Fluentd, to enable forensic log analysis and system tuning as well as data-driven performance analysis (i.e. SLI/SLO) and capacity planning. You are a competent Linux & Windows systems administrator (for multiple distributions), including storage management (e.g. LVM, RAID) and security best-practices e.g. More ❯
deployment) of e.g. ELK, CloudWatch, Fluentd, to enable forensic log analysis and system tuning as well as data-driven performance analysis (i.e. SLI/SLO) and capacity planning. You are a competent Linux & Windows systems administrator (for multiple distributions), including storage management (e.g. LVM, RAID) and security best-practices e.g. More ❯
SR2 | Socially Responsible Recruitment | Certified B Corporation™
and experience in automating/scripting. Understand and write code in multiple languages such as Python, Java, Golang, BASH and PowerShell. Experience in monitoring SLO’s, SLI’s and SLA’s a logging updates and altering where appropriate. Perks and Benefits Up to £105k (DoE) 2 days a week onsite More ❯
with customer experience, scalability, and performance in mind. Analyze system performance and reliability, offering recommendations for enhancement. Develop and uphold service-levelobjectives (SLOs), service-level indicators (SLIs), and error budgets for our More ❯
As our Site Reliability Engineer, you’ll work closely with our feature team and other colleagues to meet defined servicelevelobjectives and continually improve system and environment reliability. You’ll define SLOs, SLIs and error budgets that support finding the More ❯
City Of Bristol, England, United Kingdom Hybrid / WFH Options
Gravitas Recruitment Group (Global) Ltd
production environment and experience in automating/scripting. · Ability to quickly understand, update and write code in languages (ideally Java). · Working experience monitoring SLO’s, SLI’s and SLAs and logging updates. · Strong DevOps understanding and familiarity, including experience of Infrastructure as Code and CI/CD pipelines, e.g. More ❯
for AWS services, automation, and monitoring. SRE Practices Implementation: Establish and lead the implementation of SRE principles, such as ServiceLevelObjectives (SLOs), ServiceLevel Indicators (SLIs), and Error Budgets, to drive the More ❯
infrastructure can handle user growth. Implementing auto-scaling and self-healing mechanisms to improve system reliability. Defining and tracking ServiceLevelObjectives and ServiceLevel Indicators to maintain system health. Acting as the More ❯
ensure the reliability of this environment for our customers, SREs work closely with developers and product managers to understand servicelevelobjectives, think through failures scenarios, and design systems which balance cost with reliability objectives. Additionally, SREs collaborate with the Information More ❯
Expertise in defining and monitoring service quality metrics (such as RED, Golden Signals), establishing microservice ServiceLevelObjectives (SLOs), and managing error budgets. Proficiency in Linux, cloud networking, microservices architecture, and Amazon EKS. Preferred qualifications include: Prior More ❯
Expertise in defining and monitoring service quality metrics (such as RED, Golden Signals), establishing microservice ServiceLevelObjectives (SLOs), and managing error budgets. Proficiency in Linux, cloud networking, microservices architecture, and Amazon EKS. Preferred qualifications include: Prior More ❯
and training engineers up to Staff standard. Operational Stability: Demonstrate a production first attitude, continuously considering observability and maintaining ServiceLevelObjectives, while delivering change at pace. Research & Innovation: Embrace emerging technologies and trends, and share insights with the organisation, while More ❯
and training engineers up to Staff standard. Operational Stability: Demonstrate a production first attitude, continuously considering observability and maintaining ServiceLevelObjectives, while delivering change at pace. Research & Innovation: Embrace emerging technologies and trends, and share insights with the organisation, while More ❯
City of London, London, United Kingdom Hybrid / WFH Options
Sanderson Recruitment
As our Site Reliability Engineer, you'll work closely with our feature team and other colleagues to meet defined servicelevelobjectives and continually improve systems and environments. You'll define error budgets that support finding the right balance between risk More ❯
standards to meet performance, reliability, and maintainability of the systems. With a strong production-first mindset, drive observability, maintain ServiceLevelObjectives (SLOs), and ensure efficient incident resolution. Oversee the maintenance of existing systems, ensuring continuous improvements and prompt resolution of More ❯
standards to meet performance, reliability, and maintainability of the systems. With a strong production-first mindset, drive observability, maintain ServiceLevelObjectives (SLOs), and ensure efficient incident resolution. Oversee the maintenance of existing systems, ensuring continuous improvements and prompt resolution of More ❯
standards to meet performance, reliability, and maintainability of the systems. With a strong production-first mindset, drive observability, maintain ServiceLevelObjectives (SLOs), and ensure efficient incident resolution. Oversee the maintenance of existing systems, ensuring continuous improvements and prompt resolution of More ❯