Permanent Service-Level Objective Jobs in London

24 of 24 Permanent Service-Level Objective Jobs in London

Senior Site Reliability Engineer (SRE) / Unix

London, United Kingdom
Morgan Hunt UK Limited
OS/application deployments. Manage Oracle Database 19c on Oracle Linux (KVM) . Disaster Recovery & Automation Strengthen automation for disaster recovery (DR) activities . Work towards Recovery Time Objective (RTO) of 2hrs & Recovery Point Objective (RPO) of zero . Conduct DR testing (3 scheduled tests per financial year, potentially outside core hours). Maintain CommVault backup … . Monitoring & Observability Support logging & observability stacks (InfluxDB, Grafana, Prometheus, Nagios). Enhance monitoring via REST APIs, time-series databases, and full-stack tools (TICK, Elasticsearch, OpenSearch). Promote SLO/SLI measurement & tracking . Security & Compliance Drive security improvements & vulnerability remediation . Perform regular RHEL/KVM patching & hardening . Manage Red Hat Satellite & Ansible Automation Platform . Support More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Lead Site Reliability Engineer

London, United Kingdom
Hybrid / WFH Options
Lloyds Bank plc
critical detail to your mentees Production Kubernetes experience and debugging all services that run within the K8s ecosystem, including Istio service mesh SRE mentality (SLI, SLO & SLA) using Observability, Logging, Monitoring & Alerting (Dynatrace) Ideally coming from a software engineering or exceptional scripting skill background and have moved into SRE/DevOps while gaining a wider understanding More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Vice President, DevOps Engineer (NE) (London)

Highgate, Greater London, UK
Hybrid / WFH Options
BlackRock, Inc
the usage (and, desirably, the deployment) of e.g. ELK, CloudWatch, Fluentd, to enable forensic log analysis and system tuning as well as data-driven performance analysis (i.e. SLI/SLO) and capacity planning. You are a competent Linux & Windows systems administrator (for multiple distributions), including storage management (e.g. LVM, RAID) and security best-practices e.g. SSH, SSL/TLS, HMAC More ❯
Employment Type: Full-time
Posted:

Vice President, DevOps Engineer (NE) (London)

london, south east england, united kingdom
Hybrid / WFH Options
BlackRock, Inc
the usage (and, desirably, the deployment) of e.g. ELK, CloudWatch, Fluentd, to enable forensic log analysis and system tuning as well as data-driven performance analysis (i.e. SLI/SLO) and capacity planning. You are a competent Linux & Windows systems administrator (for multiple distributions), including storage management (e.g. LVM, RAID) and security best-practices e.g. SSH, SSL/TLS, HMAC More ❯
Posted:

AWS Head of Site Reliability Engineering (Must hold current SC) (London)

London, UK
Amber Labs
highly available. Use best practices for AWS services, automation, and monitoring. SRE Practices Implementation: Establish and lead the implementation of SRE principles, such as Service Level Objectives (SLOs), Service Level Indicators (SLIs), and Error Budgets, to drive the team's focus on reliability. More ❯
Employment Type: Full-time
Posted:

AWS Head of Site Reliability Engineering (Must hold current SC) (London)

london, south east england, united kingdom
ZipRecruiter
highly available. Use best practices for AWS services, automation, and monitoring. SRE Practices Implementation: Establish and lead the implementation of SRE principles, such as Service Level Objectives (SLOs), Service Level Indicators (SLIs), and Error Budgets, to drive the team's focus on reliability. More ❯
Posted:

Senior Application Support Engineer

London, United Kingdom
Just Group plc
configurations across legacy and modern applications to ensure their continued performance and reliability. System Monitoring & Performance: Maintain and improve logging, monitoring, and alerting systems. Define service-level objectives and indicators for business applications. Continuously review performance metrics against SLO/SLIs and proactively address performance bottlenecks or underperforming systems. Manage system More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Restaurant Technology Problem Manager

London, United Kingdom
Hybrid / WFH Options
McDonald's Corporation
issues. Experience managing and contributing to mid-large projects related to system reliability improvements. Knowledge of Site Reliability Engineering (SRE) Practices: including error budgeting, service level objectives (SLOs), and service level indicators (SLIs). Demonstrated ability to collaborate with cross-functional teams, including More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Software Engineering Manager - Financial Services (London)

Highgate, Greater London, UK
MARKS&SPENCER
Quality, Stability & Standards: Establish quality standards to meet performance, reliability, and maintainability of the systems. With a strong production-first mindset, drive observability, maintain Service Level Objectives (SLOs), and ensure efficient incident resolution. Oversee the maintenance of existing systems, ensuring continuous improvements and prompt resolution of issues. Agile Delivery & Collaboration: Working More ❯
Employment Type: Full-time
Posted:

Software Engineering Manager - Financial Services (London)

london, south east england, united kingdom
MARKS&SPENCER
Quality, Stability & Standards: Establish quality standards to meet performance, reliability, and maintainability of the systems. With a strong production-first mindset, drive observability, maintain Service Level Objectives (SLOs), and ensure efficient incident resolution. Oversee the maintenance of existing systems, ensuring continuous improvements and prompt resolution of issues. Agile Delivery & Collaboration: Working More ❯
Posted:

Senior Software Engineer - Network Production Engineer London, GBR Posted today

London, United Kingdom
Bloomberg L.P
network. Enhance existing monitoring and observability frameworks, integrating intelligent alerting and self-remediation capabilities to reduce manual intervention and improve incident response. Define and measure service-level objectives (SLOs) to track infrastructure performance and reliability. Write software utilizing orchestration systems to automate tasks and interact with other systems. Provide mentorship to More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Site Reliability Engineer - Met Office

London, United Kingdom
Microsoft Corporation
their full potential through the Microsoft Cloud. We are fast growing team, but we make sure we are committed to remain agile. Customer first, nurturing trust, high responsiveness, automation, SLO/SLI/SLA, blameless post-mortem, observability, monitoring, alerting, and toil reduction form the foundations of our code and we work with teams across Microsoft and external customers to … Baseline Personnel Security Standards; UK Security Clearance Responsibilities Collaborating closely with the existing SRE teams on building and enhancing tooling and automation solutions for faster resolution of issues impacting SLO's and averting incidents altogether when possible. Collaborating with the customers to understand their pain points around Supportability and SLO attainment and formulate strategies for addressing recurring issues in a More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Head of RGM & TPM with IT experience (London)

London, UK
Diageo
on input from stakeholders, market analysis, and user feedback Provide clarity and guidance to the development and run teams on product requirements, acceptance criteria, service level objectives and desired outcomes. Drive a culture of continuous improvement by implementing best practices, fostering innovation, and promoting experimentation within the value stream. Lead and More ❯
Employment Type: Full-time
Posted:

Head of RGM & TPM with IT experience (London)

london, south east england, united kingdom
Diageo
on input from stakeholders, market analysis, and user feedback Provide clarity and guidance to the development and run teams on product requirements, acceptance criteria, service level objectives and desired outcomes. Drive a culture of continuous improvement by implementing best practices, fostering innovation, and promoting experimentation within the value stream. Lead and More ❯
Posted:

Lead Incident Response Consultant (London)

london, south east england, united kingdom
CyberArk
and legal counsel. Establish a collaborative environment for sharing data on machine timelines and suspicious events. Create operational metrics, key performance indicators (KPIs), and service level objectives to measure team competence. #LI-CB1 Qualifications 4+ years' experience working with incident investigations utilizing EDRs, SIEMs, and containment procedures. 4+ years' experience with More ❯
Posted:

Microsoft

London, United Kingdom
Hybrid / WFH Options
Jointaro
operational insights. Last updated 5 days ago Collaborate with SRE teams on building and enhancing tooling and automation solutions Work with customers to understand pain points around Supportability and SLO attainment Be the single point of contact for enterprise customer service escalations Implement changes to service telemetry for automation consumption Enhance customer More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Product Reliability and Support Strategist, Alerting and Incident Management

London, United Kingdom
Coralogix, inc
willing to present and defend your ideas to technical and non-technical audiences. Additional Desired Skills Experience with incident management platforms like PagerDuty, OpsGenie, or similar tools Understanding of SLO/SLA management and implementations Knowledge of industry standard incident management frameworks and best practices Familiarity with automated remediation and runbook automation Experience with DevOps and SRE practices Cultural Fit More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Engineering Manager, SRE Hybrid - New York City

London, United Kingdom
Hybrid / WFH Options
vercel.com
teamwork. Build rapport with each member of the team and support them as they level up their skills. Define and maintain company-wide practices around SLO definition and management, incident management, postmortem analysis, and disaster testing and recovery. Generate informed insights regarding service quality and interface directly with executive leadership to communicate More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Senior Full Stack Engineer - Backstage in London - Flatiron Health

London, United Kingdom
Java Script Works
Required Provisioning and maintaining cloud-hosted environments in Amazon Web Services with Terraform Programming experience with React (or other JavaScript frameworks) Setting and maintaining service level objectives and service level indicators Qualities We're Looking For Kind, passionate, and collaborative problem-solver who More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Staff Software Engineer, AI Reliability Engineering

London, United Kingdom
Hybrid / WFH Options
Menlo Ventures
of Anthropic's mission to bring the capabilities of groundbreaking AI technologies to benefit humanity in a safe and reliable way. Responsibilities: Develop appropriate Service Level Objectives for large language model serving and training systems, balancing availability/latency with development velocity Design and implement monitoring systems including availability, latency and … distributed systems observability and monitoring at scale Understand the unique challenges of operating AI infrastructure, including model serving, batch inference, and training pipelines Have proven experience implementing and maintaining SLO/SLA frameworks for business-critical services Are comfortable working with both traditional metrics (latency, availability) and AI-specific metrics (model performance, training convergence) Have experience with chaos engineering and More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Head of Product (B2B SaaS & on-Prem) (London)

London, UK
WTW
ensure cost-effective utilisation of all available resources, within budget Developing and operating all Products, to agreed timescales, scope and quality (including security and service level objectives) Monitoring competition and ensuring product features remain competitive across the Product unit You Will Also Collaborate With Other ICT Leads To Develop and communicate More ❯
Employment Type: Full-time
Posted:

Head of Product (B2B SaaS & on-Prem) (London)

london, south east england, united kingdom
WTW
ensure cost-effective utilisation of all available resources, within budget Developing and operating all Products, to agreed timescales, scope and quality (including security and service level objectives) Monitoring competition and ensuring product features remain competitive across the Product unit You Will Also Collaborate With Other ICT Leads To Develop and communicate More ❯
Posted:

Site Reliability Engineer - Met Office

London, United Kingdom
Microsoft
to build and enhance tooling and automation solutions, enabling faster resolution of issues impacting SLOs and preventing incidents when possible. Engage with customers to understand their supportability challenges and SLO attainment concerns, developing sustainable strategies to address recurring issues. Serve as the primary technical contact for interfacing with large enterprise customers, managing service escalations, and driving More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:
Service-Level Objective
London
10th Percentile
£54,304
25th Percentile
£64,509
Median
£69,384
75th Percentile
£85,000
90th Percentile
£98,750