own and oversee the resolution of service-related issues from start to finish. This role offers an exciting chance to work in a dynamic environment with complex systems, making rootcause identification challenging. You will collaborate closely with your team and other Service Operations teams, including ITIL functions, Technology Office teams, our Service Providers, and our customers becoming … outcomes to the problems in hand. Lead problem resolution efforts to ensure timely and effective outcomes. Report and escalate when necessary, ensuring full transparency throughout Be accountable for the RootCauseAnalysis lifecycle and ensure accuracy and quality is obtained collaboratively with our 3rd party suppliers. Assess and manage risks associated with services and recurring problems. Work … stakeholders to improve existing processes and performance and feeding back successes and any concerns in implementation of these improvements. Conduct Trend Analyses to identify and eliminate common factors that cause incidents. Then escalate and communicate onwards as necessary with suggested solutions and mitigation. Ensure awareness of workarounds are communicated to wider business areas. For example, and not limited to More ❯
hands-on deep dive when required. Leading the teams during Major Incidents and provide recommendations on fastest path to the major incident recovery or supporting technical delivery teams with rootcauseanalysis for Major Incidents Experience of working on both SAFE/AGILE project delivery Your Profile Essential skills/knowledge/experience: MS Azure solution architect More ❯
london (city of london), south east england, united kingdom
Tata Consultancy Services
hands-on deep dive when required. Leading the teams during Major Incidents and provide recommendations on fastest path to the major incident recovery or supporting technical delivery teams with rootcauseanalysis for Major Incidents Experience of working on both SAFE/AGILE project delivery Your Profile Essential skills/knowledge/experience: MS Azure solution architect More ❯
and Grafana . Key Responsibilities: Administer and maintain Solace PubSub+ appliances and software brokers across environments (on-prem and cloud). Provide production support for messaging-related incidents, including rootcauseanalysis and resolution. Monitor system performance and health using Prometheus and Grafana ; proactively identify and address anomalies. Configure and optimize Solace across WAN environments , ensuring low More ❯
london (city of london), south east england, united kingdom
BGC Group
and Grafana . Key Responsibilities: Administer and maintain Solace PubSub+ appliances and software brokers across environments (on-prem and cloud). Provide production support for messaging-related incidents, including rootcauseanalysis and resolution. Monitor system performance and health using Prometheus and Grafana ; proactively identify and address anomalies. Configure and optimize Solace across WAN environments , ensuring low More ❯
City of London, London, United Kingdom Hybrid / WFH Options
The Curve Group
supporting critical business applications across a modern and complex technology stack. In this role, you'll be responsible for: Investigating and resolving technical incidents, ensuring minimal downtime and effective rootcauseanalysis Proactively maintaining and optimising applications, performing upgrades and configuration changes Monitoring system performance, defining service-level objectives, and addressing bottlenecks before they impact users Collaborating More ❯
City of London, London, England, United Kingdom Hybrid / WFH Options
The Curve Group
supporting critical business applications across a modern and complex technology stack. In this role, you'll be responsible for: Investigating and resolving technical incidents, ensuring minimal downtime and effective rootcauseanalysis Proactively maintaining and optimising applications, performing upgrades and configuration changes Monitoring system performance, defining service-level objectives, and addressing bottlenecks before they impact users Collaborating More ❯
system health and performance; initiate health checks and proactively remediate issues. Respond to and resolve incidents and service requests in line with SLAs. Provide break/fix troubleshooting and rootcauseanalysis across supported systems. Execute system configuration changes and support business application integrations. Remediate application vulnerabilities in collaboration with Information Security. Participate in security audits and More ❯
london (city of london), south east england, united kingdom
Proskauer Rose LLP
system health and performance; initiate health checks and proactively remediate issues. Respond to and resolve incidents and service requests in line with SLAs. Provide break/fix troubleshooting and rootcauseanalysis across supported systems. Execute system configuration changes and support business application integrations. Remediate application vulnerabilities in collaboration with Information Security. Participate in security audits and More ❯
practices and ensure an ongoing focus on reliability, scalability and operational excellence Influence incident and problem management process, driving improvements to key platform metrics Contribute to Post Incident Reviews, rootcauseanalysis, lessons learnt and post actions Champion stability and resilience across the trading platforms Ensure new systems are aligned with best practices Drive improvements and alignment … in observability and monitoring tools, improving MTTD and MTTR Produce analysis on SRE function performance Provide guidance, recommendations and hands-on support to teams, promoting SRE best practices Develop and maintain a roadmap for continuous improvement of support and observability Maintain personal/professional development to meet the changing demands of the role, including all relevant regulatory and legislative More ❯
to users/administrators of our platform. Understand our platform, cloud technologies, and troubleshooting practices to ensure successful resolution of challenging technical situations. Experience assessing, troubleshooting, resolving, and providing rootcauseanalysis for ServiceNow Product issues related to upgrades, cloning, tables, reporting, performance analytics, artificial intelligence, automated test framework, studio and development tools, plugins and applications. Manage More ❯
london (city of london), south east england, united kingdom
Dexian Europe
to users/administrators of our platform. Understand our platform, cloud technologies, and troubleshooting practices to ensure successful resolution of challenging technical situations. Experience assessing, troubleshooting, resolving, and providing rootcauseanalysis for ServiceNow Product issues related to upgrades, cloning, tables, reporting, performance analytics, artificial intelligence, automated test framework, studio and development tools, plugins and applications. Manage More ❯
experience & education Previous user or functional experience of Microsoft D365 and PowerBI preferred Strong technical background and knowledge of various technologies and platforms Strong analytical, critical thinking abilities and rootcauseanalysis skills Experience and proficiency on the delivery of business analyst artefacts (such as user stories, BRDs, process documentation) Proficiency in data analysis techniques and More ❯
london (city of london), south east england, united kingdom
McGregor Boyall
experience & education Previous user or functional experience of Microsoft D365 and PowerBI preferred Strong technical background and knowledge of various technologies and platforms Strong analytical, critical thinking abilities and rootcauseanalysis skills Experience and proficiency on the delivery of business analyst artefacts (such as user stories, BRDs, process documentation) Proficiency in data analysis techniques and More ❯
and support product upgrade and engineering aspects for Authentication platform and associated components Provide technical leadership and mentoring to IAM engineers and developers. Support critical incident response, troubleshooting, and rootcauseanalysis for IAM-related issues. Support audit and compliance activities with documentation and evidence of access controls. Stay updated on ForgeRock product roadmap and emerging IAM More ❯
london (city of london), south east england, united kingdom
HCLTech
and support product upgrade and engineering aspects for Authentication platform and associated components Provide technical leadership and mentoring to IAM engineers and developers. Support critical incident response, troubleshooting, and rootcauseanalysis for IAM-related issues. Support audit and compliance activities with documentation and evidence of access controls. Stay updated on ForgeRock product roadmap and emerging IAM More ❯
City of London, London, United Kingdom Hybrid / WFH Options
RED Global
engineering principles, including SLIs, SLOs, and error budgets. Excellent communication and stakeholder management skills. Ability to lead by influence and build consensus across diverse teams. Experience with incident response, rootcauseanalysis, and implementing preventative measures. Comfortable working in a fast-paced, results-oriented contract environment. Please apply with your up-to-date CV in English. More ❯
london (city of london), south east england, united kingdom Hybrid / WFH Options
RED Global
engineering principles, including SLIs, SLOs, and error budgets. Excellent communication and stakeholder management skills. Ability to lead by influence and build consensus across diverse teams. Experience with incident response, rootcauseanalysis, and implementing preventative measures. Comfortable working in a fast-paced, results-oriented contract environment. Please apply with your up-to-date CV in English. More ❯
of CTRM and connected upstream/downstream applications for Intraday and End of day activities. Take appropriate actions to ensure that time-critical batch steps run. This may involve rootcauseanalysis of issues and taking appropriate corrective actions (fixing curves, excluding deals/portfolios etc.). Monitor applications availability and report issues pro-actively. Communicate with More ❯
london (city of london), south east england, united kingdom
Mercuria
of CTRM and connected upstream/downstream applications for Intraday and End of day activities. Take appropriate actions to ensure that time-critical batch steps run. This may involve rootcauseanalysis of issues and taking appropriate corrective actions (fixing curves, excluding deals/portfolios etc.). Monitor applications availability and report issues pro-actively. Communicate with More ❯
etc. Act as Senior Authorised Person (SAP) for High and Low Voltage systems. Manage the Permit to Work (PTW) system and review RAMS for all activities. Lead incident response, rootcauseanalysis, and corrective action processes. Deliver robust Planned Preventative Maintenance (PPM) and reactive maintenance schedules. Monitor site KPIs, service levels, and operational risks, ensuring swift resolution More ❯
london (city of london), south east england, united kingdom
PRS
etc. Act as Senior Authorised Person (SAP) for High and Low Voltage systems. Manage the Permit to Work (PTW) system and review RAMS for all activities. Lead incident response, rootcauseanalysis, and corrective action processes. Deliver robust Planned Preventative Maintenance (PPM) and reactive maintenance schedules. Monitor site KPIs, service levels, and operational risks, ensuring swift resolution More ❯
the client’s target architecture and security standards. Act as the first point of technical escalation for all network-related issues. Own Priority 1 (P1) incidents and drive associated RootCauseAnalysis (RCA) to closure. Lead all technical conversations with the customer, ensuring clarity, confidence, and timely resolution. Operations & Support Oversee day-to-day network operations including More ❯
london (city of london), south east england, united kingdom
Microland Limited
the client’s target architecture and security standards. Act as the first point of technical escalation for all network-related issues. Own Priority 1 (P1) incidents and drive associated RootCauseAnalysis (RCA) to closure. Lead all technical conversations with the customer, ensuring clarity, confidence, and timely resolution. Operations & Support Oversee day-to-day network operations including More ❯
into the COO and operating alongside the UK Risk function and wider Enterprise Risk Management function, you will be responsible for the “first line of defence”, supporting the monitoring, analysis and escalation of risks and risk events. You will also be responsible for the implementation of additional service & management controls across the UK. Are you the right person for … support the preparation and monitoring of Risk and Control Self-Assessments (“RCSA”) Oversee key control testing and action remediation Monitor managerial control functions Identify operational risk trends and conduct rootcauseanalysis of operational risk events Contribute to risk projects as required Implement an entity-wide resilience framework, including presentation of the self-assessment to the Board More ❯