matter expert for complex, escalated incidents across core Supply Chain technologies, including WMS, G2P, TMS, SCADA, and related logistics systems. As part of a collaborative Level 3 support and analysis team, you will lead rootcause investigations, interpret logs and code traces, identify monitoring gaps, and partner closely with Build and L2 teams to strengthen long-term … strong network of peers and stakeholders who can provide expertise, collaboration, and support. MAJOR RESPONSIBILITIES Lead deep-dive investigations into high-impact incidents, collaborating with internal teams to identify rootcause and corrective actions. Analyze system logs, code traces, and service behavior across multiple supply chain platforms. Partner with Build teams and developers to address defects and visibility … gaps in infrastructure and upcoming enhancements. Handle escalations from and engage in knowledge transfer to L2 support. Create and maintain technical documentation, RootCauseAnalysis reports, and proactive stability recommendations. Contribute to the design and evolution of monitoring and alerting strategies. Support the development of new team members and junior analysts. Maintain strong working knowledge of system More ❯
reliability engineer develops and implements solutions to prevent them, ultimately enhancing the reliability of systems, equipment, and processes. Responsibilities: Analyzing equipment failure data to detect patterns and trends. Conducting rootcauseanalysis to identify the underlying causes of issues. Creating and implementing new maintenance procedures. Designing and establishing new protocols for monitoring and testing equipment. Exploring new … is incorporated into all areas of the organization. System Reliability: Design and implement strategies to improve the availability, reliability, and performance of critical systems and applications. Incident Management: Lead rootcauseanalysis for major incidents, identify systemic issues, and implement long-term solutions to prevent recurrences. Monitoring and Alerting: Develop and maintain robust monitoring systems to detect … issues proactively and optimize alerting mechanisms to ensure timely response. Capacity Planning: Analyze system usage patterns to predict future growth, optimize capacity, and ensure scalability. Failure Analysis: Conduct thorough failure analysis and implement fault tolerant systems to minimize the impact of potential failures. Collaboration: Work closely with software engineering, DevOps, and infrastructure teams to design reliable architecture and More ❯
support for IBM Workload Scheduler (IWS) (formerly known as Tivoli Workload Scheduler). High level responsibilities may include systems engineering, administration, integrations, security, IWS internals, architecture, APIs, upgrades, and rootcause analysis. Responsibilities IBM Workload Scheduler (Tivoli): The level 3 engineer for IWS will work within a SAFe development ART (Agile Release Train) in collaboration with a small … automated solutions to enhance batch and platform performance Fine tune WebSphere/Liberty application server for performance Develop customized REST API interfaces for application teams Other responsibilities may include: RootCauseAnalysis: Performing rootcause analyses (RCA) for recurring or high-impact issues HA/DR: Design and develop High Availability and Disaster Recovery (DR … in IBM Workload Scheduler (IWS) internals (including workload modeling, calendars, job streams, and dependencies) Strong knowledge of IWS's latest scheduling concepts for complex scheduling requirements, forecasting, and predictive analysis capabilities IBM Workload Automation (IWS) continued Hands-on experience with Dynamic Workload Console (DWC), Dynamic Agents, and Dynamic Broker Server Proficiency in configuring advanced job types (e.g., WebService, REST More ❯
to-end tests on code commits and pull-requests. • Monitor pipeline health and test results; collaborate with DevOps to optimize build times, parallelize tests, and reduce pipeline flakiness. Result Analysis & RootCause • Analyze test outputs, system logs, and metrics (e.g., via ELK Stack or Prometheus/Grafana) to pinpoint failures and performance regressions. • Lead root-cause … testing activity efficiently. An ISTQB Foundation Certification is a strong asset and shows your commitment to professional testing standards. A key part of this role involves problem investigation and rootcauseanalysis, so strong analytical and communication skills are a must. You’ll enjoy working as part of a collaborative team, contributing your insights to improve outcomes More ❯
/P4), ensuring swift recovery of services, clear communication, and a consistent customer-first focus. In addition, you will drive the Problem Management process, delivering Post Major Incident Reports, rootcauseanalysis, and Service Improvement Plans to reduce incident volumes and improve overall system stability. This role is critical to protecting Evri from the financial and operational … technical and business stakeholders, including executive management. Driving the Incident Reduction Plan, targeting a decrease in major service-impacting incidents. Producing Post Major Incident Reports with comprehensive detail for rootcause analysis. Performing regular reviews with resolver groups to address breached or aged tickets and improve performance. End-to-end ownership of the Problem Management lifecycle logging, assessment … investigation, resolution, closure, and Known Error tracking. Identifying incident trends and working with resolver and development teams to proactively address root causes. Leading Continuous Service Improvement (CSI) and Service Improvement Plans for services with high incident/Problem volumes or technical debt. Ensuring accurate data capture and reporting for Problem Management KPIs, CSFs, and objectives. Still interested? Great News More ❯
failures and all other problems that adversely affect plant operations. These problems include capacity, quality, cost or regulatory compliance issues. To fulfill this responsibility, the reliability engineer applies: Data analysis techniques that can include: Statistical process control Reliability modeling and prediction Fault tree analysis (FMEA) Rootcauseanalysis (RCA) and rootcause failure … analysis (RCFA) Failure reporting, analysis and corrective action system Key Competencies Ability to develop and manage good working relationships with internal departments (production, sales, logistics, accounting), contractors, suppliers inspectors and customers. Strong mechanical and electrical knowledge and aptitude. Strong written and verbal communication skills. Strong knowledge of preventive maintenance programs and the tools associated with failure detection (i.e. … vibration analysis, oil monitoring, thermography) as well as the software associated with them. Facilitator/Communicator. Deal with gray areas where responsibilities are shared by two or more groups. Find and implement solutions while avoiding finger-pointing. Required Qualifications A minimum of 7 years of maintenance/reliability engineering experience is preferred. Experience required working for an operating company More ❯
adopted and consumed by internal users. Coordinate the operation and management of certificate authorities, including succession planning and execution. Recommend and coordinate solution design processes to improve certificate platforms. RootCauseAnalysis: Conduct detailed rootcauseanalysis for critical issues using scripts and technical expertise. User Mentorship: Guide users on how they can implement More ❯
technical issues with applications, connectivity, and other IT-related concerns. • Cybersecurity Expertise: Conduct environmental vulnerability scans, remediate vulnerabilities, and perform system hardening to ensure the security of our systems. • RootCauseAnalysis: Perform in-depth analysis to resolve technical issues in Linux environments, identifying and addressing the root causes of problems. • System Performance Optimization: Analyze … and streamline IT processes. Basic Qualifications: • Top Secret Clearance Required • Ability to lift 30 pounds • Must possess a CompTIA Security+ or equivalent DoD 8572 qualified certificate • Experience performing successful rootcauseanalysis for problem resolution in Unix and Linux environments, with significant experience in: - Installing operating routine server/workstation patches, system security patches and firmware updates More ❯
Stoke-On-Trent, Staffordshire, West Midlands, United Kingdom
Rapiscan Systems
development of new imaging technologies through hardware testing, integration, and validation in our X-ray Imaging Laboratory. Youll contribute to: - Hardware and software setup, operation, and data capture - Data analysis and reporting - Product quality assurance and rootcauseanalysis - Training, demonstrations, and marketing support Key Responsibilities - Assist in testing X-ray imaging subcomponents, including setup, data … capture, and analysis - Assemble detector units for prototype testing - Support QA and rootcauseanalysis of returned or prototype units - Contribute to engineering R&D for detector array solutions - Ensure safety systems are operational and exclusion zones are properly established - Maintain up-to-date software and firmware in the lab - Track and document testing results using More ❯
pivotal role and will be responsible for developing and implementing technology solutions to support Counterparty Credit risk management activities. Responsibilities: Works directly with our business users on requirement elicitation, analysis, user acceptance testing and post-production SME help. Works closely with our business users as well as cross functionally to understand rootcauseanalysis and solutions … requirements and QA for QA functional testing. Ability to master complex tasks with minimal supervision and communicate ideas effectively. Strong analytical and problem-solving skills, with ability to conduct rootcauseanalysis and provide viable/creative solutions. Ability to analyze business process and make recommendations for improvements and translate business needs into IT requirements. Coordinates with … plans. Ability to master complex tasks with minimal supervision and communicate ideas effectively. Creates and maintains documentation for various ongoing projects and business processes. Strong ability to conduct gap analysis and provide current state vs future state analysis. Ability to work as a lead is desirable. Requirements: In-depth experience with Counterparty Credit Risk including regulations like SR More ❯
Wales, Yorkshire, United Kingdom Hybrid / WFH Options
IQUW Group
response calls, driving quick resolution while minimising business impact. Ensure timely and clear communications to stakeholders, including senior leadership, throughout the incident lifecycle. Conduct post-incident reviews (PIRs), identifying root causes and ensuring follow-up actions are tracked and implemented. Work closely with Risk, Governance, and Compliance teams to align incident and problem management activities with broader risk management … and develop the Problem Management process, driving a culture of proactive issue resolution. Identify trends, recurring issues, and systemic problems, implementing corrective actions. Work with technical teams to conduct rootcauseanalysis (RCA) and ensure long-term fixes are implemented. Maintain a problem register, tracking known errors and ensuring effective resolution. Develop and maintain Problem Management reports … tracking trends, root causes, and recurring issues. Use data-driven reporting to highlight areas for improvement, measure the impact of problem resolution efforts, and support risk mitigation. Provide regular insights and analysis on major incidents, problems, and known errors to senior management and key stakeholders. Ensure that incident and problem resolution actions are followed through, holding teams accountable More ❯
Swansea Vale, Swansea, West Glamorgan, Wales, United Kingdom Hybrid / WFH Options
ERS
response calls, driving quick resolution while minimising business impact. Ensure timely and clear communications to stakeholders, including senior leadership, throughout the incident lifecycle. Conduct post-incident reviews (PIRs), identifying root causes and ensuring follow-up actions are tracked and implemented. Work closely with Risk, Governance, and Compliance teams to align incident and problem management activities with broader risk management … and develop the Problem Management process, driving a culture of proactive issue resolution. Identify trends, recurring issues, and systemic problems, implementing corrective actions. Work with technical teams to conduct rootcauseanalysis (RCA) and ensure long-term fixes are implemented. Maintain a problem register, tracking known errors and ensuring effective resolution. Develop and maintain Problem Management reports … tracking trends, root causes, and recurring issues. Use data-driven reporting to highlight areas for improvement, measure the impact of problem resolution efforts, and support risk mitigation. Provide regular insights and analysis on major incidents, problems, and known errors to senior management and key stakeholders. Ensure that incident and problem resolution actions are followed through, holding teams accountable More ❯
Engineer to join our growing team. The Process Improvement Engineer will support the Navy's Problem Solving and Process Improvement Office (PSO) in enhancing operational performance through data-driven analysis, rootcause identification, and the implementation of best-in-class methodologies. The position focuses on developing and sustaining effective problem-solving and performance improvement systems across various … commands, leveraging modern tools, technologies, and practices to align with CNO readiness and performance goals. Key Responsibilities: Problem Solving & Process Improvement Support: Collaborate with cross-functional teams to conduct rootcauseanalysis and develop corrective action plans for performance gaps. Assist in creating driver trees, dashboards, and other visual tools to define and track Tier 1 (outcome … metrics) and lower-tier driver metrics. Recommend best-in-class methodologies and tools for performance analysis and improvement. Training & Competency Development: Support the development and deployment of standardized training materials, such as instructor guides, playbooks, and case studies. Provide on-the-job training to Navy personnel to advance organizational buy-in and process improvement expertise. Maintain a continuous feedback More ❯
City of London, London, United Kingdom Hybrid / WFH Options
REC SOLUTIONS LIMITED
with development, networks, ops and product teams on strategic IT initiatives. Assist with planning, management and resource allocation of inter-departmental projects alongside the PM team. Oversee incident management, rootcauseanalysis, and rapid resolution of system outages or performance degradation. Ensure compliance of procedures such as change management, patch management and security and audit processes. Assist … understanding of cybersecurity principles and experience implementing security measures in a regulated environment. Ability to coach, mentor, and upskill staff; develop career paths and ensure team resilience. Experience undertaking rootcauseanalysis including prevention orientated solution reporting. Working experience with deployment tools (e.g. GitLab pipelines) and rollback strategies. Proficiency in managing bare-metal servers, virtualization platforms such More ❯
City of London, London, United Kingdom Hybrid / WFH Options
REC SOLUTIONS LIMITED
with development, networks, ops and product teams on strategic IT initiatives. Assist with planning, management and resource allocation of inter-departmental projects alongside the PM team. Oversee incident management, rootcauseanalysis, and rapid resolution of system outages or performance degradation. Ensure compliance of procedures such as change management, patch management and security and audit processes. Assist … understanding of cybersecurity principles and experience implementing security measures in a regulated environment. Ability to coach, mentor, and upskill staff; develop career paths and ensure team resilience. Experience undertaking rootcauseanalysis including prevention orientated solution reporting. Working experience with deployment tools (e.g. GitLab pipelines) and rollback strategies. Proficiency in managing bare-metal servers, virtualization platforms such More ❯
networking and distributed systems. Strong communication and documentation skills are essential, as you'll be collaborating closely with global teams. We value candidates who solve complex technical challenges through rootcauseanalysis, can adapt to a fast-paced environment, and effectively communicate technical concepts. You'll make an impact by deploying and managing critical infrastructure, creating automated … support hybrid infrastructure environments - System Design: Architect and implement secure, scalable solutions while considering system interdependencies and limitations - Problem Solving: Analyze complex technical issues and develop effective solutions through rootcauseanalysis - Documentation & Training: Create and maintain technical documentation, develop training materials, and support team knowledge sharing - Collaboration: Work effectively with global teams, provide technical consultation, and More ❯
Birmingham, West Midlands, England, United Kingdom
TXP
Hybrid (Birmingham) We are seeking a dynamic and experienced Senior Application Support Developer to support our clients. The ideal candidate will have an outstanding ability to troubleshoot system problems, rootcauseanalysis and provide insight in system improvements and can deliver solutions based on their findings. Key Responsibilities To ensure Incident SLAs are met by the support … Applying standards: Good in application of relevant industry and process standards to all tasks undertaken. Strong interpersonal skills to interact with customers and other team members Experience with investigative rootcauseanalysis and incident management Benefits: 25 days annual leave (plus bank holidays) An additional day of paid leave for your birthday (or Christmas eve) 4% Matched More ❯
Strong hands-on experience with monitoring and observability tools (such as Azure Monitor, KQL, Application Insights, Power Platform Admin Center, Log Analytics, etc.). Solid understanding of incident management, rootcauseanalysis, and developing effective resolution workflows. Proficiency in scripting and automation for operational tasks (Python, Azure CLI, or similar) Microsoft Azure Full Stack experience, including Azure … Power Platform components. Develop and configure effective monitoring and alerting systems to proactively identify performance issues, outages, or abnormal system behaviors. Respond promptly to incidents and service interruptions, executing root-causeanalysis and troubleshooting to restore services efficiently. Collaborate closely with development, infrastructure, and business teams to continuously improve operational processes and system resilience. More ❯
relevant teams when required Troubleshoot Microsoft applications, Teams Rooms, and office AV/meeting room technology Provide support across hardware, software, and network-related issues Ensure effective ticket management, rootcauseanalysis, and documentation of remedies Contribute to continuous improvement by updating knowledge bases and encouraging self-service Deliver a consistent high-quality experience when supporting senior … Citrix, VPN clients) What Success Looks Like - High rate of first-time ticket resolution with minimal escalation Clear communication and proactive updates for end users Demonstrated fault-finding and rootcauseanalysis Positive feedback from C-Suite and end users alike Continuous improvement of Service Desk efficiency and knowledge base More ❯
relevant teams when required Troubleshoot Microsoft applications, Teams Rooms, and office AV/meeting room technology Provide support across hardware, software, and network-related issues Ensure effective ticket management, rootcauseanalysis, and documentation of remedies Contribute to continuous improvement by updating knowledge bases and encouraging self-service Deliver a consistent high-quality experience when supporting senior … Citrix, VPN clients) What Success Looks Like - High rate of first-time ticket resolution with minimal escalation Clear communication and proactive updates for end users Demonstrated fault-finding and rootcauseanalysis Positive feedback from C-Suite and end users alike Continuous improvement of Service Desk efficiency and knowledge base More ❯
test plans (ITPs), ensuring proper verification and validation of hardware components and systems against design requirements and customer specifications. Develop quality metrics and dashboards to monitor process health, drive rootcauseanalysis, and support strategic quality planning. Collaborate with Engineering and Production to evaluate design feasibility, manufacturability, and adherence to quality standards during reviews and milestone phases. … Skills: Demonstrated experience with ISO 9001 QMS, along with working knowledge of additional standards such as ISO 45001, 14001, 27001, 31000, and ITIL. Strong analytical skills with experience in rootcauseanalysis, corrective actions, and statistical process control (SPC) Proven leadership in guiding teams to adhere to internal procedures and government/industry regulations. Experience writing, revising More ❯
Consultant with 6 + Yrs of experience to support ongoing operations of our Oracle EBS 12.2.3 platform. The consultant will provide L2/L3 support for finance modules, drive rootcauseanalysis (RCA), ensure issue resolution, and participate in change control and documentation efforts. Deep expertise in both functional and technical aspects of Oracle Finance is essential. … for Oracle EBS Finance modules: AP, AR, GL, FA, EBTax, Vertex, OneSource, and Cash Management. • Troubleshoot and resolve incidents and defects across L2/L3 within SLA timelines. • Perform rootcauseanalysis (RCA) for recurring and critical incidents. • Create and maintain functional and technical documentation, SOPs, and support guides. • Participate in CAB (Change Advisory Board) meetings and More ❯
Chester, Cheshire West and Chester, Cheshire, United Kingdom
Ascendion
We are looking for an experienced Application Production Services Specialist to provide production support. The role involves incident identification, resolution, problem management, and rootcauseanalysis, ensuring service stability and performance. You will collaborate with development teams, infrastructure teams, and business stakeholders to support system upgrades, audits, and operational enhancements. Key Responsibilities: 1. Provide L1/L2 …/L3 production support, including incident management, rootcauseanalysis, and problem resolution. 2. Manage incident and problem tickets using enterprise ITSM tools. 3. Perform capacity and performance management to ensure system stability. 4. Work closely with development teams for onboarding new applications and upgrades. 5. Support internal and external audits, ensuring compliance and documentation. 6. Collaborate More ❯
management; Horizon VDI configuration and management; managing Active Directory; SAN management, including FC Zoning and LUN and Volume management Experience of infrastructure management including change management, system testing and rootcauseanalysis; backup, recovery and business continuity Desirable Experience in managing a cloud environment, including Microsoft Azure Skills, abilities, and attributes Essential Customer focus; able to identify … develop and manage relationships with both internal and external stakeholders Excellent trouble shooting and RootCauseAnalysis skills, within complex infrastructure stacks Excellent written and verbal communication skills Desirable Ability to plan and design high quality training materials and packages for a variety of training interventions including e-learning and face to face training to small and More ❯
maintain project plans, schedules, and budgets. • Manage & control the project costs & financial performance, including approval of Timesheet • Facilitate stakeholder meetings to align project goals and address concerns proactively. • Conduct rootcauseanalysis for issues and propose corrective actions. • Oversee project scope, risks, and changes, ensuring alignment with project objectives. • Prepare and present status reports to clients and … feedback constructively. Required Skills • Strong leadership and problem-solving abilities. • Excellent communication and interpersonal skills. • Proficiency in project management tools and techniques. • Ability to work independently with some oversight. • Rootcauseanalysis and continuous improvement mindset. Required Education •A recognised project management certification, such as CAPM, Prince2, or APM, is preferred. •Demonstrable understanding of both Agile and More ❯