the operation and maintenance of regulatory reporting systems for financial institutions. This role focuses on ensuring stable system operation through continuous monitoring, rapid incident response (including on-call support), rootcauseanalysis, and documentation. The ideal candidate will have experience in maintaining mission-critical systems and strong skills in SQL-based data analysis, particularly with Microsoft … Location: Central London (Hybrid) Position Overview This position supports the stable operation of regulatory reporting systems used by financial institutions. You will be responsible for system monitoring, incident response, rootcauseanalysis, and providing comprehensive incident reports. The role requires attention to detail and a strong sense of ownership, as you will be supporting systems critical to … Respond to system incidents and provide both temporary and permanent solutions • Extract and analyse data using SQL (Microsoft SQL Server) • Analyse incidents using logs and DB traces to identify root causes • Prepare detailed incident reports, including rootcause and preventative actions • Communicate with clients and internal stakeholders in both English and Japanese Must Requirements • Experience in IT More ❯
City of London, London, United Kingdom Hybrid / WFH Options
REC SOLUTIONS LIMITED
with development, networks, ops and product teams on strategic IT initiatives. Assist with planning, management and resource allocation of inter-departmental projects alongside the PM team. Oversee incident management, rootcauseanalysis, and rapid resolution of system outages or performance degradation. Ensure compliance of procedures such as change management, patch management and security and audit processes. Assist … understanding of cybersecurity principles and experience implementing security measures in a regulated environment. Ability to coach, mentor, and upskill staff; develop career paths and ensure team resilience. Experience undertaking rootcauseanalysis including prevention orientated solution reporting. Working experience with deployment tools (e.g. GitLab pipelines) and rollback strategies. Proficiency in managing bare-metal servers, virtualization platforms such More ❯
City of London, London, United Kingdom Hybrid / WFH Options
REC SOLUTIONS LIMITED
with development, networks, ops and product teams on strategic IT initiatives. Assist with planning, management and resource allocation of inter-departmental projects alongside the PM team. Oversee incident management, rootcauseanalysis, and rapid resolution of system outages or performance degradation. Ensure compliance of procedures such as change management, patch management and security and audit processes. Assist … understanding of cybersecurity principles and experience implementing security measures in a regulated environment. Ability to coach, mentor, and upskill staff; develop career paths and ensure team resilience. Experience undertaking rootcauseanalysis including prevention orientated solution reporting. Working experience with deployment tools (e.g. GitLab pipelines) and rollback strategies. Proficiency in managing bare-metal servers, virtualization platforms such More ❯
scoring, and mitigation strategies. Communicate the status and impact of incidents clearly to all relevant stakeholders, working in collaboration with them to resolve and close down actions. Conduct thorough rootcauseanalysis, host post-incident reviews, and ensure implementation and tracking of post-incident remedial actions for accountable stakeholders. Develop, maintain, and continuously improve the effectiveness of … complex, sometimes high-pressure situations and deliver the right outcomes for our business and customers. Experience with incident management technology and process automation to drive operational improvements. Expertise in rootcauseanalysis, action tracking, and developing reporting to monitor progress and risk mitigation. Excellent stakeholder management and influencing skills, with the ability to maintain focus and accountability More ❯
This is a key appointment and will require the individual to be technically strong with a critical engineering/data centre background. This position will get involved regularly with rootcauseanalysis, fault rectification and act as the appointed Senior Authorised Person, for the operation of both Low and High Voltage equipment . The position will be … a mix of Hybrid working and travelling to sites around the M25 Role Profile: Senior Authorised Person for HV Rootcauseanalysis/review of any technical faults Carrying out scenario and training with engineering staff Liaising with the clients to ensure excellent feedback and customer satisfaction Assisting and Organising with System shutdowns across sites Providing technical More ❯
leads Problem Management globally, you'll assist in the development and coordination of the effective functioning of problem management activities across MMCTech. We will count on you to: Determine rootcause, resolution, and identify and recommend improvements that can be made to prevent recurrence. Analyse incident volumes and trends to identify process or technological improvements that will reduce … incidents to be resolved at lower support tiers. Conduct postmortem investigations on critical incidents, identify and recommend corrective action items, and create a business facing document detailing the incident, rootcause, and steps for remediation. We will also look for you to help manage service relationships across MMCTech by chairing Problem Boards and other meetings that will review … regarding the progress of individual problems. Be flexible and willing to work longer hours or outside of regular working hours in the event of critical issues that require expedited rootcause analysis. There may be a future requirement to work on a shift basis outside of the normal 9 to 5 schedule, and the role may also transition More ❯
improvement across a growing and forward-thinking firm. Key Responsibilities: - Contribute to audit quality initiatives and drive best practices across the firm. - Conduct audit cold file reviews and support rootcause analysis. - Monitor developments in audit regulation and ensure firm-wide compliance. - Support the evolution of internal methodologies, policies, and procedures. - Assist with internal audit and assurance projects … quality through collaboration with partners, managers, and senior stakeholders. Ideal Candidate: - Strong recent experience in audit within a UK professional services firm. - Solid understanding of ISAs, audit regulation, and rootcauseanalysis methodology. - Practical experience conducting audit file reviews is highly desirable. - Strong communication skills and ability to build trust with senior stakeholders. - Organised, detail-oriented, and More ❯
Holiday Days + your local bank holidays 1 Birthday day - it only happens once a year! 3 So Giving Days - spend these days giving back to your chosen cause Religious Celebrations Leave Mental Healthcare - Sessions withUnmind Enhanced Family Leave Values-driven culture - we're really proud of our culture. So Energy Who we are So Energy was created in … monthly/quarterly reports for senior leadership, including trend analyses, month-over-month comparisons, and variance explanations. • Support strategic initiatives and Lead or participate in special projects- providing data analysis, model validation, and performance tracking. • Conduct "deep dives" into specific issues and coordinate cross-departmental follow-up actions. KPI Monitoring & Analysis • Help define, monitor, and report on key … performance indicators (KPIs) related to payments • Perform root-causeanalysis for KPI deviations, pinpointing underlying issues such as bottlenecks in the collections workflow • Recommend corrective actions to improve KPIs, working closely with Collections Operations and Finance teams. Trend Identification & Forecasting • Analyse historical payment and collections trends to forecast cash flow, projected delinquency levels, and potential bad-debt More ❯
London, South East, England, United Kingdom Hybrid / WFH Options
Lorien
and management of Incidents in order to return service as quickly as possible while managing key stakeholder communications and expectations. Prevent incidents from materialising into Major incidents through remediating root causes, proactive management of incidents that may cause a Major Incident or other event with substantial impact to our business. To ensure the effective identification and resolution of … trends and root causes that could cause a negative impact to service stability Key Accountabilities: Act as escalation path for major and critical incidents to ensure service is restored. Direct and coordinate the Major Incident process, coordinating activities of resolver teams including specialist support groups/third-party suppliers. Monitor and in some circumstances manage escalating or potential … Ensure accurate timely, proactive communications with key stakeholders through Incident/Major Incident lifecycle. Ensure 3rd party suppliers fulfil their contractual obligations, especially with regard to SLAs for incidents, rootcauseanalysis, monitoring trends and problem resolution. Plan, execute and document appropriate follow up activities relating to Major Incidents ensuring that a Problem Record is created and More ❯
and ever-changing landscape. The business analyst will be assigned to a number of initiatives within the Core Infrastructure Value Stream.Primarily - engagement of stakeholders, creating documentation, analysing processes, data analysis and data landscapes to enable the initiatives to drive value and deliver positive outcomes for the business. The role will require a flexible individual who can work on a … number of analysis tasks at once, that can be used by the team and presented to various levels of stakeholders.They will need to be able to work at pace in a demanding environment. Responsibilities Support the Product Owner to help inform prioritisation and delivery decisions Perform requirements elicitation, rootcauseanalysis, as-is/to-be … mapping, gap analysis, business case development, backlog stories creation/maintenance Facilitate internal and external stakeholder workshops, building a valuable relationship with our business and technology community Consume and understand complex requirements and turn these into valuable product driven outputs Analyse and document business processes, data flows, and system interactions Collaborate with engineers, testers, and other team members to More ❯
infrastructure acrossAWSandGCP, ensuring resilience, cost-efficiency, and data security. Collaborate closely with infrastructure, architecture, and cybersecurity teams to meet internal risk, compliance, and governance requirements. Support live systems, perform rootcauseanalysis, and implement solutions for incidents and performance bottlenecks. Qualifications and experience The ideal candidate for this role will have the below experience and qualifications: Bachelor More ❯
checks to identify process defects Reporting Support the creation of routine reporting packs and dashboards for internal stakeholders, utilising and defining performance metrics - Service Level Agreements (SLAs) etc Conduct Analysis utilising tools such as Excel or PowerBI, to identify trends and opportunities for both system optimisation and improvement in operational performance Continuous Improvement - Operations process optimisation Proactively identify opportunities … generating and maintaining a knowledgeable Problem Solving Critically assess and collaboratively work alongside the function's operations team, managed service vendors and enterprise IT team to identify/support rootcauseanalysis and remediation of issues, incidents and escalation. Bridge the gap by translating business requirements to the Tech team and vice versa Vendor Management Maintain a More ❯
to improve system reliability. Security & Compliance: Apply best practices for cloud security, IAM policies, and compliance frameworks (SOC2, ISO 27001, etc.). Incident Response & Performance Optimization: Troubleshoot issues, perform rootcauseanalysis, and implement fixes to optimize performance. Infrastructure as Code (IaC): Utilize Terraform, Ansible, or similar tools to automate infrastructure provisioning and configuration management. Collaboration & Knowledge More ❯
Oversee technology issues management and risk acceptance processes. Lead on the 2LoD review of material Technology Incidents and Risk Events ensuring that actual/potential losses, fix details and rootcauseanalysis is reporting in a timely and accurate manner within risk governance. Strategic challenge of 1LoD identification and evaluation of risks associated with technology regulatory change … of mitigation strategies. Escalate material technology risks and issues within the Chief Risk Office and to wider risk governance and recommend appropriate mitigation. Provide insightful data driven technology risk analysis support risk-based decision-making. Report emerging technology risks within risk governance as part of integrated risk reporting. Provide subject matter expertise on emerging technology risks, including cloud security … as ITIL, COBIT, NIST, ISO. Demonstrable extensive relevant experience of technology and change/operational risk in either a 1LoD or 2LoD capacity (2LoD preferable). Experience in scenario analysis and resilience impact assessments would be advantageous. Core skills and competencies A strong working knowledge of Microsoft products including Excel and Word, strong analytical skills and ability to provide More ❯
storage, backups, and Linux systems using tools such as Ansible, Terraform, and GitHub. Collaborate with cross-functional teams to align infrastructure delivery with DevOps best practices. Lead incident response, rootcauseanalysis, and ongoing support for critical infrastructure services. Define and implement infrastructure administration standards and procedures. Champion Infrastructure as Code and continuous improvement across the hosting More ❯
Identify and prioritise test cases suitable for automation, aligned with both functional and non-functional needs. Continuously refine automation frameworks and testing processes to boost efficiency and quality. Conduct rootcauseanalysis of defects and collaborate with development teams to ensure prompt resolution. Actively participate in Agile ceremonies including sprint planning, daily stand-ups, and retrospectives. Create More ❯
data) to understand real-world needs and ship tools that directly support program delivery in the field. Debug and resolve production issues across our stack, with a focus on rootcauseanalysis and long-term fixes. Advocate for sustainable engineering practices, including testing, documentation, and monitoring Help shape our tech roadmap with an eye toward scale, maintainability More ❯
data) to understand real-world needs and ship tools that directly support program delivery in the field. Debug and resolve production issues across our stack, with a focus on rootcauseanalysis and long-term fixes. Advocate for sustainable engineering practices, including testing, documentation, and monitoring Help shape our tech roadmap with an eye toward scale, maintainability More ❯
following areas: Application Performance Monitoring Anomaly detection and alerting Synthetic monitoring and log monitoring Real User Monitoring across web and mobile Dynatrace Query Language DQL and Grail for data analysis API integration with external systems Use of Davis AI for rootcauseanalysis and predictive insights Additional Skills : Ability to manage competing priorities in a fast More ❯
London, South East, England, United Kingdom Hybrid / WFH Options
Pontoon
following areas: Application Performance Monitoring Anomaly detection and alerting Synthetic monitoring and log monitoring Real User Monitoring across web and mobile Dynatrace Query Language DQL and Grail for data analysis API integration with external systems Use of Davis AI for rootcauseanalysis and predictive insights Additional Skills : Ability to manage competing priorities in a fast More ❯
performance. The Role: Maintaining and monitoring real-time and batch data pipelines using Flink, Kafka, Python, and AWS Act as an escalation point for critical data incidents and lead rootcauseanalysis Optimising system performance, define SLIs/SLOs, and drive reliability Woking closely with various other departments and teams to architect scalable, fault-tolerant data solutions More ❯
City of London, London, United Kingdom Hybrid / WFH Options
Rise Technical Recruitment Limited
performance. The Role: *Maintaining and monitoring real-time and batch data pipelines using Flink, Kafka, Python, and AWS *Act as an escalation point for critical data incidents and lead rootcauseanalysis *Optimising system performance, define SLIs/SLOs, and drive reliability *Woking closely with various other departments and teams to architect scalable, fault-tolerant data solutions More ❯
London, South East, England, United Kingdom Hybrid / WFH Options
Rise Technical Recruitment Limited
performance. The Role: *Maintaining and monitoring real-time and batch data pipelines using Flink, Kafka, Python, and AWS*Act as an escalation point for critical data incidents and lead rootcauseanalysis*Optimising system performance, define SLIs/SLOs, and drive reliability *Woking closely with various other departments and teams to architect scalable, fault-tolerant data solutions More ❯
Build scalable, testable, and reliable systems with a strong focus on performance Collaborate with global development and business teams to design and implement effective technical solutions Provide technical support, rootcauseanalysis, and issue resolution Job requirements: 3 - 7 years experience working with C++ (C++11 or later; C++20 preferred) Understanding of Linux-based development environments Excellent problem More ❯
understanding of Agile development and the role of testing throughout the sprint lifecycle. Comfort with writing clear, testable acceptance criteria using Gherkin or similar syntax. Excellent debugging, investigation, and root-causeanalysis skills. A collaborative, detail-oriented mindset and strong communication across teams. BSc in a related field such as Computer Science, Computer Engineering, or other software More ❯