1 to 25 of 89 Permanent Root Cause Analysis Jobs in London

数据科学家

Hiring Organisation
JD.COM
Location
London Area, United Kingdom
Responsibilities 1. Operational data analysis for international shared services: Understand the operational processes of shared services, comprehensively monitor key performance indicators, promptly detect data fluctuations and conduct in-depth root cause analysis, swiftly identify business anomalies and risk points, and provide solutions; 2. Aggregate international business … product lines across international business units, master current data storage methods and logical frameworks, consolidate international business data, promptly detect fluctuations and conduct root cause analysis; 3. Analyse international cash flow statements: Collaborate with Treasury and Accounting teams to understand JD's capital operations and data sources ...

Sr Service Reliability Engineer – Kings Cross, London

Hiring Organisation
Jobleads-UK
Location
Greater London, England, United Kingdom
robust monitoring, alerting, and observability systems (e.g., using AWS CloudWatch, Dynatrace) to ensure rapid issue detection and resolution.* - Monitor infrastructure capacity and performance, providing analysis and suggestions for service delivery improvement.* Automation & Efficiency:* - Drive the automation of repetitive operational tasks, including infrastructure provisioning, deployments, and scaling.* - Create and maintain … deployment speed and reliability.* Incident Management & Collaboration:* - Participate in an on-call rotation to troubleshoot and mitigate production incidents.* - Lead post-incident reviews and root cause analyses to implement lasting solutions.* - Partner with engineering and IT stakeholders to embed SRE best practices (SLOs, error budgets) into the design ...

Technical Account Manager – Latin America (Spanish Speaker)

Hiring Organisation
PSD Group
Location
City of London, London, United Kingdom
accepting payments with my clients Payment network, with a primary focus on Mexico and the wider LatAm region. This role owns the identification, analysis, and remediation of card acceptance breakages, ensuring seamless transaction performance across all relevant BIN ranges Acting as the bridge between data insights and market execution … resolve issues efficiently. What You’ll Be Responsible For: Monitor and manage a portfolio of priority merchants, identifying acceptance gaps using transaction data, trend analysis, and baseline modelling. Lead root cause analysis and remediation of payment failures across POS and e-commerce environments. ...

Site Reliability Engineer — AWS & Observability

Hiring Organisation
Jobleads-UK
Location
Greater London, England, United Kingdom
issues Leverage AI tooling – Use AI‐assisted development tools (e.g. GitHub Copilot) to accelerate infrastructure work, and explore AI‐driven approaches to incident detection, root cause analysis, and remediation What We're Looking For Essential 3+ years in an SRE, Platform, or DevOps engineering role AWS services … regulated environments or with compliance frameworks Experience with AI‐driven DevOps tooling (e.g. AWS DevOps Agent or similar AI agents for incident resolution, root cause analysis, and operational improvement) Experience with SLIs, SLOs, and error budgets On‐Call We have a 24/7 customer support team ...

DevOps Technical Lead

Hiring Organisation
Data Careers
Location
South East London, London, United Kingdom
Employment Type
Permanent, Work From Home
Implement progressive delivery practices Reliability & Observability Define and track SLIs/SLOs Enhance monitoring, alerting and incident response processes Lead post-incident reviews and root cause analysis Drive reduction of operational toil Security & Compliance Embed DevSecOps controls into pipelines Implement least-privilege IAM models Support … tooling experience (GitHub Actions, GitLab CI, Jenkins) Experience operating production SaaS environments Strong observability tooling knowledge (Datadog, Prometheus, ELK etc.) Incident management and root cause analysis experience Experience in regulated or security-conscious environments is highly desirable ...

IT Application Delivery Analyst

Hiring Organisation
Robert Walters
Location
London, South East, England, United Kingdom
Employment Type
Full-Time
Salary
Salary negotiable
timely and efficient manner Perform routine system checks and monitoring to ensure optimal application performance and security Identify, troubleshoot, and resolve technical issues through root cause analysis Assist with application testing, deployment, and release activities Collaborate with IT teams to develop and implement new software applications … experience working within a legal firm environment Proven experience supporting legal technology applications Strong hands-on experience with: iManage Intapp Excellent analytical, troubleshooting, and root cause analysis skills Experience using ServiceNow or similar ITSM platforms Basic PowerShell and SQL scripting experience for automation and support tasks Experience ...

Senior Lead Software Engineering - AI/ML Engineer

Hiring Organisation
Jobleads-UK
Location
Greater London, England, United Kingdom
/ML Data Platforms team, you will play a key role in building scalable and resilient data solutions. You will engage in root cause analysis, production changes, and operational improvements, while supporting budgetary and staffing decisions. You will mentor team members and partner with colleagues across … Databricks, Snowflake, AWS, and Kubernetes Coordinate incident management coverage to ensure effective resolution of application issues Collaborate with cross‐functional teams to perform root cause analysis and implement production changes Develop and support AI/ML solutions for troubleshooting and incident resolution Mentor and guide team members ...

Technical Service Assurance Analyst

Hiring Organisation
Venturi
Location
City of London, London, United Kingdom
efficiently while maintaining high levels of service availability and end-user satisfaction. The role involves diagnosing complex issues, supporting infrastructure and applications, contributing to root cause analysis, and ensuring services meet agreed service levels and operational standards. Key Responsibilities: Incident & Problem Management Act as the primary owner … incidents and service requests from initiation to resolution. Diagnose and resolve complex technical issues across applications, systems, and infrastructure. Perform root cause analysis for recurring incidents and contribute to problem management processes. Ensure incidents are managed in accordance with best practices and defined service levels. Service Assurance ...

DSO Operational Telecoms Engineer

Hiring Organisation
UK Power Networks (Operations) Ltd
Location
London, Shadwell, United Kingdom
Employment Type
Permanent
commissioning tests as business-as-usual function Develop process for site commissioning of flexible connection customers Communicate with customers to address technical queries Perform root cause analysis to investigate issues and take corrective actions 3. Support DSO field-based solution Help develop operational procedures for the safe … evolving smart grid technologies Experience working with SCADA protocols including, DNP3 standard. Experience with Telecommunications and IT networking principles. Experience with techniques including, root cause analysis Previous experience of testing and commissioning of SCADA systems Handle basic IT networking and troubleshooting Use network analysis tools including ...

Technical Operations Engineer

Hiring Organisation
ACS Performance
Location
South Croydon, London, United Kingdom
Employment Type
Permanent, Work From Home
Salary
£45,000
Improve operational efficiency through automation and process improvement 2 System reliability performance and uptime Monitor uptime performance and service stability across all sites Perform root cause analysis after incidents and implement long term fixes Work with engineering teams and suppliers on recurring technical issues Support system upgrades … networks and troubleshooting Experience with routers switches and firewalls Strong understanding of Windows operating systems and system diagnostics Experience with remote device management Strong root cause analysis skills Good documentation and asset management practices Understanding of endpoint security and device compliance Desirable Experience Camera based recognition systems ...

Head of Delivery and Service - UK Security Clearance eligibility required

Hiring Organisation
Jobleads-UK
Location
Greater London, England, United Kingdom
Management, ServiceNow or equivalent) to support efficient service operations and reporting. Chair regular Service Review meetings with customers, providing performance reporting against SLAs, trend analysis, and improvement recommendations. Drive continual service improvement initiatives informed by incident trends, customer feedback, and operational data. Ensure robust Major Incident Management processes … place, including communication protocols, post‐incident reviews, and root cause analysis. Own capacity and availability management, including proactive monitoring, scaling recommendations, and FinOps‐aligned resource optimisation. People & Culture Build, lead, and develop the delivery and service management team, ensuring adequate capacity and capability to meet growing demand. Foster ...

3rd Line / IT Infrastructure Engineer

Hiring Organisation
SER (Staffing) Ltd
Location
London, South East, England, United Kingdom
Employment Type
Full-Time
Salary
£45,000 - £50,000 per annum
fast-growing MSP. Key Responsibilities Act as a 3rd line escalation point for complex infrastructure incidentsTroubleshoot and resolve business-critical technical issues, including root cause analysisSupport and contribute to major incident management and post-incident reviewsDeliver proactive infrastructure improvements, including monitoring, patching, and optimisationProvide technical oversight for changes … Required 2–3+ years in a 2nd/3rd Line or Infrastructure roleExperience working within an MSP or multi-client IT environmentStrong troubleshooting and root cause analysis skillsComfortable supporting complex infrastructure environmentsExperience across Microsoft cloud and on-prem infrastructure Desirable Skills Microsoft certifications (e.g. Azure Administrator ...

Cybersecurity Analyst

Hiring Organisation
Ryder Reid Legal Ltd
Location
London, South East, England, United Kingdom
Employment Type
Full-Time
Salary
Salary negotiable
tools, in line with documented SLAs. Investigate, respond to, and resolve security incidents and alerts, ensuring timely detection, containment, and remediation. Perform triage and root cause analysis of incidents, collaborating with IT and other teams to address underlying security issues. Conduct email threat analysis using both … defence practices and modern attack techniques. Hands-on experience with security technologies such as EDR, XDR, SIEM, SOAR, IDS, and IPS. Experience in vulnerability analysis, security alert analysis, incident response, and email threat analysis. Ability to read and understand scripting and query languages such as PowerShell, Python ...

Cyber Security Analyst

Hiring Organisation
Ryder Reid Legal
Location
City of London, London, United Kingdom
tools, in line with documented SLAs. Investigate, respond to, and resolve security incidents and alerts, ensuring timely detection, containment, and remediation. Perform triage and root cause analysis of incidents, collaborating with IT and other teams to address underlying security issues. Conduct email threat analysis using both … defence practices and modern attack techniques. Hands-on experience with security technologies such as EDR, XDR, SIEM, SOAR, IDS, and IPS. Experience in vulnerability analysis, security alert analysis, incident response, and email threat analysis. Ability to read and understand scripting and query languages such as PowerShell, Python ...

Site Reliability Engineer

Hiring Organisation
Jobleads-UK
Location
Greater London, England, United Kingdom
self-healing automation for achieving autonomous operations. • Collaborate with cross-functional teams to ensure systems are scalable, resilient, and maintainable. • Drive incident management and root cause analysis processes through automation, ensuring continuous improvement to enable autonomous operations. • Partner with engineering, architecture, and product teams to enable shift ...

Engineer - Site Reliability Engineering

Hiring Organisation
Jobleads-UK
Location
Greater London, England, United Kingdom
quickly recover service.* Partner with development teams to improve system reliability, observability, and release velocity.* Participate in on-call rotations, incident response, postmortems, and root cause analysis and resolution.* Be a vocal advocate of strong/sound engineering practices that allow us to build, deploy ...

Applied AI Machine Learning Vice President

Hiring Organisation
Jobleads-UK
Location
Greater London, England, United Kingdom
fraud and enhance automation and decision‐making processes. Ensure robust model performance to meet the business expectations and the firm's governance standards. Perform root cause analysis for emerging trends in model performance, and communicate complex findings, insights, and recommendations to senior management and partners. Support … broader Risk Fraud Modelling initiative on synthetic data and synthetic IDs on behalf of the ICB business, contributing to the analysis, testing, and validation of synthetic data approaches. Work with multiple partner teams—including Strategy, Technology, Product Management, Legal, Compliance, Business Management, and Model Governance—to ensure the models ...

Senior Business Analyst

Hiring Organisation
Jobleads-UK
Location
Greater London, England, United Kingdom
This role requires strong leadership, the ability to manage multiple projects, and deep expertise in stakeholder engagement across all levels of the organization. Business Analysis & Documentation Lead the creation, review, and maintenance of Business Requirement Documents (BRDs), functional specifications, and process workflows Translate business requirements into clear, actionable deliverables … organizational standards, controls, and compliance requirements Production Support Liaison Act as the primary liaison between project teams and production support teams Support issue triage, root cause analysis, and prioritization of production defects Ensure smooth transition of projects into production with proper documentation and knowledge transfer Collaborate ...

AWS DevOps Engineer

Hiring Organisation
Data Careers
Location
Central London, London, United Kingdom
Employment Type
Permanent, Work From Home
Salary
£65,000
release efficiency Support engineering teams in adopting platform and deployment best practices Support observability, monitoring, and alerting solutions Participate in incident management, troubleshooting, and root cause analysis Contribute to operational resilience and service stability improvements Support implementation of cloud security controls and operational governance standards Assist with ...

Manager – Site Reliability Engineering

Hiring Organisation
Jobleads-UK
Location
Greater London, England, United Kingdom
resilience of business-critical systems.* **Incident Management & Crisis Leadership**Act as Incident Commander during major incidents, leading resolution efforts, managing stakeholder communications, and driving root cause analysis and remediation.* **Team Leadership & Talent Development**Build and mentor a high-performing SRE team. Promote a culture of accountability, continuous ...

Application Support Engineer

Hiring Organisation
Euro Car Parks
Location
Central London, London, United Kingdom
Employment Type
Permanent
types and automations and contributing to its continual improvement Creating, amending and removing user access and permissions across the applications we support Investigating and root-causing complex, cross-system issues spanning our Azure services, databases, messaging layer, reporting platforms and third-party integrations Monitoring production health and acting proactively … remove waste Assisting the Application Support Manager with day-to-day support activities and suggesting refinements to policies and procedures as well as preparing root cause analysis reports on major issues for relevant stakeholders within the business. Analysing trends across incidents and alerts to reduce avoidable downtime ...

Senior Database Engineer

Hiring Organisation
Jobleads-UK
Location
Greater London, England, United Kingdom
Monitoring & Incident Response Monitor database health, performance, and capacity, responding to alerts and incidents as required. Take ownership of complex incidents, contributing to investigation, root cause analysis, and remediation. Validate backup, recovery, and resilience mechanisms through regular testing. Collaboration & Knowledge Sharing Work closely with developers and engineering … solutions. Experience working closely with application development teams. Exposure to supporting databases in cloud or hybrid environments. Advanced T‐SQL skills for diagnostics analysis and development support. Experience in supporting databases in cloud or hybrid environments. Understanding of operational disciplines: monitoring, change control, incident management. Clear communication skills, able ...

IT Infrastructure Team Manager

Hiring Organisation
Barnet and Southgate College
Location
Barnet, London, United Kingdom
Employment Type
Permanent
Salary
GBP 44,085 - 47,737 Annual
availability, performance and resilience at all times. Incident management and escalation: Act as the technical escalation point for complex infrastructure incidents and problems, supporting root cause analysis and implementing preventative measures to improve service reliability. Cyber security and resilience: Ensure infrastructure services are maintained securely, implement … implementing and administering G Suite, Apple School Manager, Jamf, Papercut, Veeam or Adobe Creative Cloud Knowledge of MacOS and Linux system administration Business analysis and case writing experience Potential Interview Dates Week commencing 29th June 2026. Our Employee Pledge The College offers employees a generous holiday allowance ...

Senior Technical Engineer

Hiring Organisation
Jobleads-UK
Location
Greater London, England, United Kingdom
Manager instances* Oversee code quality through design reviews, code reviews, and continuous improvement of engineering standards.* Troubleshoot complex production issues and drive rootcause analysis and long‐term fixes.**Azure & DevOps*** Design and implement solutions using core Azure services (e.g. App Services, Azure Functions, Azure SQL, Storage ...

Service Delivery Director

Hiring Organisation
Jobleads-UK
Location
Greater London, England, United Kingdom
across cloud environments.* Performance Monitoring & Reporting: Track KPIs, compile performance reports, share updates with stakeholders, and escalate service gaps.* Process Improvement: Identify inefficiencies, conduct root cause analysis, and implement corrective/continuous improvement initiatives.* Compliance & Standards: Maintain compliance with internal policies, and ensure compliance with industry standards ...