health, performance, and availability of infrastructure components and applications. Configure alerting mechanisms to notify teams of potential issues and proactively address them before they impact users. Incident Response and RootCauseAnalysis: Participate in incident response activities to identify, troubleshoot, and resolve incidents. Communicate incident status and updates to ensure both internal and external customers are fully … informed. Conduct rootcauseanalysis to determine the underlying causes of incidents and implement preventive measures to avoid recurrence. Performance & Cost Optimization: Analyze system performance metrics and identify opportunities for optimization. Tune infrastructure components, optimize configurations, and implement performance enhancements to ensure optimal performance and resource utilization. Security and Compliance: Implement security controls, and respond to security More ❯
Infrastructure Observability Engineer to lead the design, implementation, and continuous improvement of our client's enterprise observability platform. This role focuses on delivering comprehensive monitoring, event correlation, and impact analysis, demonstrating AIOps capabilities and tools such as BMC Helix Operations Manager. The ideal candidate will be passionate about improving access to infrastructure performance, automating operational intelligence, and reducing mean … time to resolution (MTTR) through intelligent alerting and rootcause analysis. Key Responsibilities Own and evolve the enterprise observability strategy across all infrastructure tracks Design, implement, and support event management and impact analysis workflows using platforms such as BMC Helix Operations Manager Integrate and correlate data from multiple sources (e.g., 20+ monitoring systems) into a unified monitoring … Stack (ELK) Hands-on experience with BMC Helix Operations Manager, TrueSight, or similar enterprise monitoring platforms Solid understanding of AIOps concepts, including event correlation, noise reduction, anomaly detection, and rootcauseanalysis Strong proficiency with scripting (e.g., Python, PowerShell, Bash) for automation and data handling Solid understanding of networking fundamentals Excellent problem-solving skills with the ability More ❯
Milton Keynes, Buckinghamshire, England, United Kingdom Hybrid / WFH Options
Human Capital Ventures
to enhance their support capabilities. You will provide support and resolution to high-level 2nd line support escalations, then provide major incident support and problem management activities to perform rootcauseanalysis and implement preventative measures to manage incidents through the ServiceNow ITSM. The successful candidate will be supporting the 3rd party provider and employees during core … tasks to appropriate teams as needed. Escalate wider-impacting support issues to the Service Desk Team Lead and Head of IT Support when necessary. Conduct advanced network troubleshooting and rootcause analysis. Provide advanced support for mobile devices (Apple & Android) and Mac OS X. Administer Active Directory Users and Computers through ADMP and CoreView. Deliver expert-level support More ❯
hybrid cloud, and developing and proposing solutions *Analysing business problems resulting from the migration and adoption of hybrid cloud, and developing and proposing solutions *Responsible for facilitating strategic business analysis activities to determine strategic customer requirements to drive Business Improvement, Business Change and technical Development. *Scope and evaluate requests for BA support within the programme. *Applies a wide range … of business analysis skills and techniques to create as-is and to-be use cases to support business and technical change. *Investigate and solve a variety of complex problems, including Business Process Analysis, presenting options to stakeholders inside and outside the programme. *Process design; creation of guardrails documents to support migration and outcome delivery. *Conducting various business analyst … the programme requirements and priorities. Required qualification: Use of relevant methodologies and professional qualification such as CBAP, CCBA. ECBA. PMI-PBA Required experience: *6+ years' experience working in Business Analysis *Experience using PESTLE analysis, business process modelling, rootcauseanalysis, use case modelling Essential skills: *Initiates and facilitates communication with a diverse set of senior More ❯
Portsmouth, Hampshire, United Kingdom Hybrid / WFH Options
Checkatrade
such as Slack, Zoom, and MS Teams. Act as a subject matter expert on Google Workspace and Okta , which are central to our IT environment. Troubleshoot technical issues, perform rootcauseanalysis, and implement long-term fixes. Manage hardware and asset lifecycles, ensuring accurate inventory and smooth onboarding/off boarding processes. Provide professional, customer-focused support … our modern, cloud-based infrastructure. Background in fast-paced tech scale-ups or high-growth digital businesses. ITIL Foundation certification. Strong diagnostic and problem-solving skills, including experience with rootcause analysis. Excellent communication and stakeholder management skills; confident supporting users at all levels. Experience supporting digital transformation projects and rapid adoption of new tools and technologies. A More ❯
You'll Do: Investigate Product Returns: You'll play a crucial role in ensuring the highest quality standards by conducting thorough investigations into customer-returned products, meticulously identifying the rootcause of any issues. (This is not a customer service role; you won't be handling customer calls!) Uncover Trends: You'll utilise our advanced computer systems to … effective communication skills. Attention to Detail: Ability to follow procedures and accurately document findings. Analytical Mindset: A keen eye for detail and the ability to analyse information to find root causes. Qualifications: Secondary education, or equivalent qualification, completed in Maths and English. Additional Requirements: Experience (Beneficial): laboratory best practice, problem solving, and rootcauseanalysis is More ❯
You'll Do: Investigate Product Returns: You'll play a crucial role in ensuring the highest quality standards by conducting thorough investigations into customer-returned products, meticulously identifying the rootcause of any issues. (This is not a customer service role; you won't be handling customer calls!) Uncover Trends: You'll utilise our advanced computer systems to … effective communication skills. Attention to Detail: Ability to follow procedures and accurately document findings. Analytical Mindset: A keen eye for detail and the ability to analyse information to find root causes. Qualifications: Secondary education, or equivalent qualification, completed in Maths and English. Additional Requirements: Experience (Beneficial): laboratory best practice, problem solving, and rootcauseanalysis is More ❯
London, South East, England, United Kingdom Hybrid / WFH Options
Lorien
and management of Incidents in order to return service as quickly as possible while managing key stakeholder communications and expectations. Prevent incidents from materialising into Major incidents through remediating root causes, proactive management of incidents that may cause a Major Incident or other event with substantial impact to our business. To ensure the effective identification and resolution of … trends and root causes that could cause a negative impact to service stability Key Accountabilities: Act as escalation path for major and critical incidents to ensure service is restored. Direct and coordinate the Major Incident process, coordinating activities of resolver teams including specialist support groups/third-party suppliers. Monitor and in some circumstances manage escalating or potential … Ensure accurate timely, proactive communications with key stakeholders through Incident/Major Incident lifecycle. Ensure 3rd party suppliers fulfil their contractual obligations, especially with regard to SLAs for incidents, rootcauseanalysis, monitoring trends and problem resolution. Plan, execute and document appropriate follow up activities relating to Major Incidents ensuring that a Problem Record is created and More ❯
Proactively identify areas for improvement and implement preventive measures. Service Improvement: Continuously assess the IT service delivery process and implement improvements that enhance efficiency, effectiveness, and customer satisfaction. Lead rootcauseanalysis for service delivery issues and define corrective actions. Change Management: Ensure that changes to the IT environment are implemented smoothly with minimal disruption to service. More ❯
Woking, Surrey, United Kingdom Hybrid / WFH Options
Arrow McLaren IndyCar
be key, by using data, analytics, and machine learning to deliver world championship reliability tools. Role Dimensions: The Software & Data Science group in McLaren F1 is responsible for the analysis, design, and delivery of software tools and methodologies which improve the team and car's performance. We are a cross-functional group, bringing together data science, machine learning, software … engineering, and DevOps to deliver performance focused platforms and solutions. In reliability engineering, you will understand issue tracking and management, rootcauseanalysis, integrating with other systems through API's, and will have experience in building complex user interfaces that can present and manage large amounts of data. As a Senior Specialist Software Engineer, your role will … combine elements of technical leadership, agile/lean project delivery, and stakeholder management. You'll be involved in all stages of the development life cycle from initial analysis through deployment, monitoring, and support. You will own systems architecture for the software you deliver, integrating with the wider McLaren F1 racing platform, and will balance the requirements of reliability engineering More ❯
Southampton, Hampshire, South East, United Kingdom
Spectrum It Recruitment Limited
Doing Providing advanced (Tier 4) support for complex technical issues escalated by Tier 2 and 3 teams Troubleshooting production and test system issues via logs, traces, telemetry, and SQL analysis Scripting solutions and automating tasks using PowerShell and similar tools Simulating customer issues in local/test environments for detailed rootcauseanalysis Collaborating with the … times through tooling and knowledge-sharing Supporting the UK customer base with occasional flexibility to liaise with US counterparts What We're Looking For Strong SQL skills for data analysis and report creation Cloud support experience in a technical role Working knowledge of PowerShell scripting (report automation, system tasks) Understanding of C# (not necessarily coding, but enough to troubleshoot More ❯
portsmouth, hampshire, south east england, united kingdom
Spectrum IT Recruitment
Doing Providing advanced (Tier 4) support for complex technical issues escalated by Tier 2 and 3 teams Troubleshooting production and test system issues via logs, traces, telemetry, and SQL analysis Scripting solutions and automating tasks using PowerShell and similar tools Simulating customer issues in local/test environments for detailed rootcauseanalysis Collaborating with the … base with occasional flexibility to liaise with US counterparts What We're Looking For 5+ years of cloud support experience in a technical role Strong SQL skills for data analysis and report creation Working knowledge of PowerShell scripting (report automation, system tasks) Understanding of C# (not necessarily coding, but enough to troubleshoot and advise) Hands-on experience with Microsoft More ❯
to £70,000 (dependent on experience) Working Arrangement: Hybrid (~2 days on-site per week) Office Location: Central London Responsibilities: Problem Management: Support and facilitate Problem Management activities, driving rootcauseanalysis and producing insightful reports to enhance service performance. Service Transition/Service Introduction: Collaborate with Transformation and Technical Teams to ensure robust service transition practices More ❯
Azure PaaS infrastructure with Terraform and ARM templates or equivalent technologies At least one high level computer software language. e.g., C# Application Performance Monitoring Incident Management, including incident response, rootcauseanalysis and post-mortem processes Proficient with: PowerShell, Azure CLI, GIT JavaScript, JSON, XML, YAML Experience with: Distributed architectures Container technologies Maintaining and deploying highly available More ❯
Maidenhead, Berkshire, United Kingdom Hybrid / WFH Options
dynaTrace software GmbH
as the global point of contact for Billings & Collections process matters across business units and regions Identify and implement process improvements to enhance efficiency, accuracy, and user experience Drive rootcauseanalysis and resolution of billing disputes and collection delays Collaborate with cross-functional teams within the business to streamline workflows Technology & Systems Oversight: Ensure optimal configuration More ❯
Milton Keynes, Buckinghamshire, United Kingdom Hybrid / WFH Options
Triad
Identify and prioritise test cases suitable for automation, aligned with both functional and non-functional needs. Continuously refine automation frameworks and testing processes to boost efficiency and quality. Conduct rootcauseanalysis of defects and collaborate with development teams to ensure prompt resolution. Actively participate in Agile ceremonies including sprint planning, daily stand-ups, and retrospectives. Create More ❯
Oxfordshire, South East, United Kingdom Hybrid / WFH Options
Network IT
and critical platform services Develop and manage automation scripts and workflows using Ansible , Terraform , or PowerShell Collaborate with engineering teams to support infrastructure upgrades and issue resolution Contribute to rootcauseanalysis and implement preventative measures Document support procedures and maintain a comprehensive knowledge base Participate in on-call rotations and incident response efforts as needed Critical More ❯
Eastbourne, East Sussex, South East, United Kingdom
Nextech Group Limited
You'll Do Provide expert-level support across Microsoft technologies (M365, Azure, Windows Server, AD, Exchange, Intune, etc.) Take ownership of critical 3rd line incidents, ensuring swift resolution and rootcauseanalysis Lead infrastructure and cloud migration projects end-to-end Support security and compliance initiatives across Microsoft environments Collaborate closely with internal stakeholders and clients to More ❯
Farnborough, Hampshire, South East, United Kingdom Hybrid / WFH Options
Queen Square Recruitment Limited
part of a microservices architecture. Collaborate with cross-functional teams to refine requirements, define technical solutions, and complete code reviews. Support production systems by diagnosing and resolving issues, conducting rootcauseanalysis, and delivering timely fixes. Contribute to Agile ceremonies and work independently from user stories or specifications. Create and maintain documentation in JIRA and Confluence. Work More ❯
and sustainability. Develop and support continuous service improvement plans to align IT infrastructure with evolving business needs. Facilitate proactive and reactive problem resolution throughout the information system lifecycle, including rootcauseanalysis and implementation of preventive measures. Install, configure, and troubleshoot applications that support IT services. Act as a third-level resource for collaboration applications, collaborating with More ❯
and orchestration tools to improve transparency and governance. Conduct code reviews, testing, and documentation to ensure quality and robustness of analytics outputs. Take ownership of incident resolution and lead rootcauseanalysis to ensure sustainable remediation. Enforce best practices for change control, release management, rollback planning, and deployment testing. Maintain high levels of internal client satisfaction by More ❯
Hemel Hempstead, Hertfordshire, South East, United Kingdom Hybrid / WFH Options
Southern Communications Ltd
Play a critical role in the governance of the MVC and ASP.NET framework estate, aligning with compliance, security, and change control processes. Take ownership of incidents and issues, ensuring rootcauseanalysis and robust long-term solutions. Maintain a backlog of enhancement requests, prioritising changes in line with stakeholder needs and product direction. Help establish and grow More ❯
Basingstoke, Hampshire, South East, United Kingdom Hybrid / WFH Options
Southern Communications Ltd
Play a critical role in the governance of the MVC and ASP.NET framework estate, aligning with compliance, security, and change control processes. Take ownership of incidents and issues, ensuring rootcauseanalysis and robust long-term solutions. Maintain a backlog of enhancement requests, prioritising changes in line with stakeholder needs and product direction. Help establish and grow More ❯
New Milton, Hampshire, United Kingdom Hybrid / WFH Options
Appello Careline Limited
reports on service health. Maintain effective alerting and escalation processes. Service Stability & Outage Mitigation Act as escalation point for major incidents. Coordinate cross-functional responses to service issues. Drive rootcauseanalysis and preventative actions. Service Level Management Define and manage SLAs and OLAs. Monitor performance and lead service reviews. Identify and implement service improvements. ️ Availability Management More ❯
Reading, Berkshire, United Kingdom Hybrid / WFH Options
Thames Water Utilities Limited
teams to mitigate risks and enforce governance policies. Optimise performance and scalability of Salesforce infrastructure. Establish monitoring and logging solutions for proactive issue detection. Respond to production incidents, performing rootcauseanalysis and implementing preventive measures. Collaborate with platform teams to resolve performance bottlenecks. Identify opportunities to automate manual processes within the Salesforce estate. Lead efforts to More ❯