ensuring governance, security, compliance, and control. Experience Requirements: Proven experience in a senior SRE role or similar. Strong knowledge of cloud technologies and SLA SLO SLI management. Experience leading teams and implementing SCRUM processes. Excellent communication and leadership skills. Experience line managing, mentoring, and coaching. Responsibilities: Collaborate with the Principal More ❯
Cambridge, Cambridgeshire, United Kingdom Hybrid / WFH Options
AI Tech Suite
ensuring governance, security, compliance, and control. Experience Requirements: Proven experience in a senior SRE role or similar. Strong knowledge of cloud technologies and SLA SLO SLI management. Experience leading teams and implementing SCRUM processes. Excellent communication and leadership skills. Experience line managing, mentoring, and coaching. Responsibilities: Collaborate with the Principal More ❯
Southampton, Hampshire, United Kingdom Hybrid / WFH Options
NICE
planning Create sustainable systems and services through automation and uplifts Balance feature development speed and reliability with well-defined servicelevelobjectives Have you got what it takes? 3-6 years of working experience in a similar role, with a focus More ❯
valued. What You'll Do Key responsibilities in this role will include (but not be limited to): Leveraging core SRE values - measuring (SLI/SLO/SLA), testing, and eliminating toil via automation with appropriate Disaster Recovery planning Refining KPIs to enable data-driven decision making for availability and reliability More ❯
deployment) of e.g. ELK, CloudWatch, Fluentd, to enable forensic log analysis and system tuning as well as data-driven performance analysis (i.e. SLI/SLO) and capacity planning. You are a competent Linux & Windows systems administrator (for multiple distributions), including storage management (e.g. LVM, RAID) and security best-practices e.g. More ❯
ensure the reliability of this environment for our customers, SREs work closely with developers and product managers to understand servicelevelobjectives, think through failures scenarios, and design systems which balance cost with reliability objectives. Additionally, SREs collaborate with the Information More ❯
a critical operations function that is responsible for the monitoring, availability and performance of production services. Responding to stakeholder requests within agreed timescales or SLO Drive automation to reduce failures, manual tasks and therefore improving overall application performance and availability. Perform systems administration activities to ensure the smooth operation of More ❯
Expertise in defining and monitoring service quality metrics (such as RED, Golden Signals), establishing microservice ServiceLevelObjectives (SLOs), and managing error budgets. Proficiency in Linux, cloud networking, microservices architecture, and Amazon EKS. Preferred qualifications include: Prior More ❯
Expertise in defining and monitoring service quality metrics (such as RED, Golden Signals), establishing microservice ServiceLevelObjectives (SLOs), and managing error budgets. Proficiency in Linux, cloud networking, microservices architecture, and Amazon EKS. Preferred qualifications include: Prior More ❯
and training engineers up to Staff standard. Operational Stability: Demonstrate a production first attitude, continuously considering observability and maintaining ServiceLevelObjectives, while delivering change at pace. Research & Innovation: Embrace emerging technologies and trends, and share insights with the organisation, while More ❯
This will involve: Defining and implementing ServiceLevel Indicators (SLIs) and ServiceLevelObjectives (SLOs) to measure and maintain system and application performance, ensuring services meet agreed reliability targets. Instrumenting applications to collect … principles, including the creation and management of ServiceLevel Indicators (SLIs), ServiceLevelObjectives (SLOs) and error budgets ensuring reliability and performance. Experience in implementing observability, instrumenting applications to provide insights into system More ❯
improve their C#/.NET Core skills. Support and enhance current systems and initiatives during office hours, ensuring that servicelevelobjectives are met. Maintain a strong focus on quality, reusability, clean architectures, security, and resilience across the full application lifecycle. More ❯
Bristol, Gloucestershire, United Kingdom Hybrid / WFH Options
Experis - ManpowerGroup
improve their C#/.NET Core skills. Support and enhance current systems and initiatives during office hours, ensuring that servicelevelobjectives are met. Maintain a strong focus on quality, reusability, clean architectures, security, and resilience across the full application lifecycle. More ❯
City of London, London, United Kingdom Hybrid / WFH Options
Sanderson Recruitment
As our Site Reliability Engineer, you'll work closely with our feature team and other colleagues to meet defined servicelevelobjectives and continually improve systems and environments. You'll define error budgets that support finding the right balance between risk More ❯
systems and third-party solutions. Network Health Management: Define and implement prediction pipelines for long-term network health, availability, and service-level objectives. Operations Automation: Lead initiatives to automate and optimize network operations focusing on scalability and reliability. Collaborative Development: Work closely More ❯
observability frameworks, integrating intelligent alerting and self-remediation capabilities to reduce manual intervention and improve incident response. Define and measure service-levelobjectives (SLOs) to track infrastructure performance and reliability. Write software utilizing orchestration systems to automate tasks and interact with More ❯
a critical operations function that is responsible for the monitoring, availability and performance of production services Responding to stakeholder requests within agreed timescales or SLO Drive automation to reduce failures, manual tasks and therefore improving overall application performance and availability Perform systems administration activities to ensure the smooth operation of More ❯
and EMEA time zones Preferred (Bonus) Skills Hands-on experience with tools like PagerDuty, OpsGenie, ServiceNow, CloudWatch, Chronosphere, or similar Understanding of SLA/SLO implementation and performance tracking Exposure to incident management frameworks, automated remediation, and runbook automation Background in DevOps or SRE culture and tooling Prior people leadership More ❯
and EMEA time zones Preferred (Bonus) Skills Hands-on experience with tools like PagerDuty, OpsGenie, ServiceNow, CloudWatch, Chronosphere, or similar Understanding of SLA/SLO implementation and performance tracking Exposure to incident management frameworks, automated remediation, and runbook automation Background in DevOps or SRE culture and tooling Prior people leadership More ❯
your ideas to technical and non-technical audiences. Additional Desired Skills Experience with incident management platforms like PagerDuty, OpsGenie, or similar tools Understanding of SLO/SLA management and implementations Knowledge of industry standard incident management frameworks and best practices Familiarity with automated remediation and runbook automation Experience with DevOps More ❯
and service automation. Lead the definition and track ServiceLevelObjectives (SLO) to measure service availability in combination with service, product and engineering communities. Collaborate with product and engineering More ❯
Level Agreements (SLA) through ServiceLevelObjectives (SLO) and ServiceLevel Indicators (SLI). Liaise with client technical and business teams as needed to ensure More ❯
key member of the SRE leadership team. Lead the definition and track ServiceLevelObjectives (SLO) to measure service availability in combination with service, product and engineering communities. Collaborate with product and engineering More ❯
key member of the SRE leadership team. Lead the definition and track ServiceLevelObjectives (SLO) to measure service availability in combination with service, product and engineering communities. Collaborate with product and engineering More ❯
customers in their security infrastructure design and planning. You will be regularly completing deployment projects on or before expected ServiceLevelObjectives (SLOs) and integrating new systems into existing network architecture. On the support side, you'll efficiently manage customer support More ❯