South East London, England, United Kingdom Hybrid / WFH Options
Cognitive Group | Part of the Focus Cloud Group
Cleared or Eligible for SC Clearance Your responsibilities: Deploy, configure, and monitor AWS services ensuring high availability, scalability, and security. Respond to and resolve infrastructure and service incidents with rootcauseanalysis and preventive measures. Handle change requests, track recurring issues, and work on long-term fixes to improve system stability. Implement and maintain observability solutions using … configuration and deployment management experience with CI/CD Desirable skills Hands-on experience with Terraform or CloudFormation for infrastructure provisioning and automation. Strong knowledge of Splunk for log analysis and troubleshooting. Strong problem-solving skills and analytical thinking. More ❯
to-end tests on code commits and pull-requests. • Monitor pipeline health and test results; collaborate with DevOps to optimize build times, parallelize tests, and reduce pipeline flakiness. Result Analysis & RootCause • Analyze test outputs, system logs, and metrics (e.g., via ELK Stack or Prometheus/Grafana) to pinpoint failures and performance regressions. • Lead root-cause … testing activity efficiently. An ISTQB Foundation Certification is a strong asset and shows your commitment to professional testing standards. A key part of this role involves problem investigation and rootcauseanalysis, so strong analytical and communication skills are a must. You'll enjoy working as part of a collaborative team, contributing your insights to improve outcomes More ❯
ITSM) processes including asset, change, incident, request, problem, and project management to meet service levels. Provide on-site IT support and assist in resolving broader technical issues. Contribute to rootcauseanalysis and long-term problem management. Act as a key point of contact between IT and users, promoting standards, improving user satisfaction, and sharing best practices. More ❯
Bletchley, Buckinghamshire, United Kingdom Hybrid / WFH Options
In Technology Group
with IT and development teams to ensure secure system architecture and application development. Maintain and enhance incident response procedures and disaster recovery plans. Investigate and document security breaches, providing rootcauseanalysis and remediation plans. Conduct security awareness training for staff and ensure compliance with internal policies and regulatory requirements (e.g., FCA, GDPR, ISO 27001). Stay More ❯
Milton Keynes, Buckinghamshire, South East, United Kingdom Hybrid / WFH Options
In Technology Group Limited
with IT and development teams to ensure secure system architecture and application development. Maintain and enhance incident response procedures and disaster recovery plans. Investigate and document security breaches, providing rootcauseanalysis and remediation plans. Conduct security awareness training for staff and ensure compliance with internal policies and regulatory requirements (e.g., FCA, GDPR, ISO 27001). Stay More ❯
New Milton, Hampshire, United Kingdom Hybrid / WFH Options
Appello
infrastructure and cloud services. Deep understanding of SIP, VoIP, VoLTE, STUN, and firewall bridging. Proficiency in Node.js application support and server diagnostics. Hands-on experience using tools for SIP analysis, such as Wireshark, SIP Traces, or packet analysers. Excellent problem-solving and communication skills. Bachelor's degree in Computer Science, Information Technology, or a related field (or equivalent experience … Azure Solutions Architect, or AWS equivalent. ITIL Foundation certification THE ROLE Key Responsibilities Advanced Technical Support Resolve complex hardware, software, and network issues escalated from lower-tier support. Conduct rootcauseanalysis and implement long-term solutions. Manage high-impact incidents to ensure minimal business disruption. ️ Server & Application Support Troubleshoot server issues across cloud (AWS), on-premise More ❯
and compliance requirements. • Act as the primary point of contact for internal business units (including Operations, Compliance & Transactional Banking), IT and external vendors, regarding service performance and enhancements. • Lead rootcauseanalysis and resolution of major incidents. Drive problem management to reduce recurring issues and improve service stability. • Manage projects involving any future enhancements or regulatory changes More ❯
technical level to install cyber security product technologies and systems, such as firewalls, end point protection, encryption, VPN, SIEM, PAM, VM etc. Support the Cyber Security Teams to lead rootcauseanalysis of cyber security related incidents to ensure prompt action is taken to prevent incident reoccurrence and strengthen relevant cyber security controls. Provide technical guidance and More ❯
Woking, England, United Kingdom Hybrid / WFH Options
Pyramid Recruitment Ltd
customer support. Key Responsibilities: Customer Support & Issue Resolution – Investigate and resolve escalated support tickets, meeting SLA targets. Communicate effectively with customers via email, phone, and support portal. Technical Troubleshooting & Analysis – Perform rootcauseanalysis using SQL, application logs, and API integrations to identify and resolve system issues. User Acceptance Testing (UAT) – Support customers during UAT cycles … Cloud & SaaS Technologies (Microsoft Azure, Office 365, Azure AD, MFA). PowerShell, Bash scripting , and basic Python/JavaScript for automation. Experience with ITSM tools (Jira), ITIL framework, and rootcauseanalysis . What's on Offer: Competitive Salary – Up to £35,000 Hybrid Working 25 days holiday (rising to 30) + Birthday Day + Bank Holidays More ❯
Southampton, Hampshire, United Kingdom Hybrid / WFH Options
Aztec
Oversee technology issues management and risk acceptance processes. Lead on the 2LoD review of material Technology Incidents and Risk Events ensuring that actual/potential losses, fix details and rootcauseanalysis is reported in a timely and accurate manner within risk governance. Strategic challenge of 1LoD identification and evaluation of risks associated with technology regulatory change … of mitigation strategies. Escalate material technology risks and issues within the Chief Risk Office and to wider risk governance and recommend appropriate mitigation. Provide insightful data driven technology risk analysis to support risk-based decision-making. Report emerging technology risks within risk governance as part of integrated risk reporting. Provide subject matter expertise on emerging technology risks, including cloud … as ITIL, COBIT, NIST, ISO. Demonstrable extensive relevant experience of technology and change/operational risk in either a 1LoD or 2LoD capacity (2LoD preferable). Experience in scenario analysis and resilience impact assessments would be advantageous. Core skills and competencies A strong working knowledge of Microsoft products including Excel and Word, strong analytical skills and ability to provide More ❯
on uptime, resilience, and cost-efficiency. Perform performance tuning, system optimisation, and capacity planning to ensure infrastructure reliability and maintainability. Engage in Major Incident and Problem Management, including conducting RootCauseAnalysis (RCA) and implementing long-term solutions. Conduct regular reviews across client estates, proactively identifying and addressing potential risks or inefficiencies. Develop and maintain comprehensive system More ❯
with ERP systems and their process integration. Good overall knowledge and experience of the SC/OtC business processes. Excellent understanding of EDI systems. Capable in problem solving and rootcause analysis. Continuous Improvement mindset. Data analytics and reporting skills (Analytical). Strong written and verbal communication. Excellent MS Office skills, specifically Excel. Ability to Lead and embrace More ❯
and scale Kubernetes clusters hosting critical microservices Design and enhance observability, alerting, and incident response processes Collaborate closely with engineers to ensure systems are reliable, secure, and performant Lead rootcauseanalysis for production incidents and help prevent recurrence Build tooling to automate repetitive tasks and improve deployment pipelines (CI/CD) Participate in on-call rotation More ❯
South East London, England, United Kingdom Hybrid / WFH Options
Explore Group
and scale Kubernetes clusters hosting critical microservices Design and enhance observability, alerting, and incident response processes Collaborate closely with engineers to ensure systems are reliable, secure, and performant Lead rootcauseanalysis for production incidents and help prevent recurrence Build tooling to automate repetitive tasks and improve deployment pipelines (CI/CD) Participate in on-call rotation More ❯
prem environments. What You’ll Be Doing: Managing and supporting Solace PubSub+ appliances and software brokers across cloud and on-prem platforms Responding to production incidents and working on rootcauseanalysis and long-term fixes Monitoring system health and performance with Prometheus, Grafana, and custom dashboards Optimising Solace across WAN environments for secure, low-latency message More ❯
operational performance, and security compliance. Facilitate effective communication between IT teams and business units. Problem Solving and Incident Management: Manage and resolve high-priority incidents and critical issues. Conduct rootcauseanalysis and implement corrective actions to prevent recurrence. Develop and maintain incident response plans and procedures. Requirements: Proven experience as a Digital Operations Manager, IT Manager More ❯
Portsmouth, Hampshire, South East, United Kingdom Hybrid / WFH Options
Sopra Steria Limited
not limited to Cisco Routing, Switching, Security, SDN, Unified Communications and Wireless technologies). Identify and explore opportunities for enhancing efficiency, leveraging orchestration technologies to streamline and automate. Lead 'RootCauseAnalysis' investigations into network faults, security and performance issues. Support the Principal NetOps Engineer and Architects with project implementation. Liaise with third party service providers for More ❯
Document as you go - to support colleaguesfollow what's been done and why Drive tasks forward with energy and enthusiasm Create proactive monitoring solutions using standard tooling Conduct RCA (RootCauseAnalysis) for incidents Develop and maintain self-managing infrastructure services and dashboards Define the metrics of success and report on progress Implement Infrastructure as Code for … automation tools (Puppet, Ansible, Git), pipelines (Azure DevOps) and test automation Experience with CI/CD tooling (Azure DevOps) Comfortable with Elasticsearch log standardisation, Kibana dashboard creation and data analysis skills AWS hands on - Cloud formation, Route53, S3, DynamoDB, Cloud-watch, Lambda, Security, and troubleshooting, Azure experience also useful Certificate management and automation Strong troubleshooting and diagnosis skills Able More ❯
Reading, Berkshire, United Kingdom Hybrid / WFH Options
DCL
escalations Conduct advanced threat hunting using the Microsoft Security Stack. Build, optimise and maintain workbooks, rules, analytics etc. Correlate data across Microsoft 365 Defender, Azure Defender and Sentinel. Perform rootcauseanalysis and post-incident reporting. Aid in mentoring and upskilling Level 1 and 2 SOC analysts. Required Skills & Experience: The ability to achieve UK Security Clearance More ❯
a hands-on leadership role - you won’t just guide others, you’ll be the go-to expert when systems are under pressure. You'll lead incident response, own rootcauseanalysis, and solve performance issues like memory leaks, outages, and flaky services. Your focus will include : Leading incident management, post-mortems, and blameless RCAs Building scalable More ❯
improvement projects as per company's programs and needs. In this context, the QA Engineer is involved in the qualification process, responsible for quality improvement actions, supporting with data analysis and reporting. To succeed in this mission, the Quality Engineer needs to build collaborative links internally with Engineering, Supply Chain, Quality and Production departments. Key Roles & Responsibilities Contribute to … level determined for internal KPI's, plan and monitor appropriate actions and propose improvement in the processes. Ensure effective implementation of corrective and preventive actions internally by supporting on rootcauseanalysis and effectiveness check of corrective actions. Provide the necessary data to enable continuous improvement actions as needed. Ensure proper quality procedures are implemented (proper controls … frequency and tools) and monitor results. Manage projects and key indicators by creating suitable data analysis and report findings based on statistical evidence and trending. Qualifications At least 2 years of experience in manufacturing. Experience in Quality Assurance and Quality audits. Candidate with experience from the Smart Card industry will be a plus. Education: Bachelor's Degree in an More ❯
as well as training users in these systems and the use of reports. This position is responsible for extracting data from multiple sources, manipulating and validating data, and conducting rootcauseanalysis and will also present analytic findings. They play an essential role in presenting operational solutions and recommendations to leadership. This involves gathering requirements, drawing insights … collaboratively in a cross-functional team, learns from colleagues, and provides routine updates on calls related to projects What we are looking for: Required Skills: • Experience with systems functional analysis, technology business analysis, and basic understanding of the different technical platforms, databases, and related technologies •Advanced knowledge of MS SQL Server, Tableau, MS Excel (functions and formulas) and More ❯
Aylesbury, Buckinghamshire, South East, United Kingdom
McCormick UK Limited
and implement KPIs and performance measures leveraging best practice benchmarking with a focus on continuous improvement Collaborate with IT on system implementation/enhancement initiatives Ensure adequate and timely rootcauseanalysis to understand problem drivers and implementation of necessary corrections and/or changes. Collaborate with various stakeholders to ensure global alignment of priorities working in More ❯
Southampton, Hampshire, United Kingdom Hybrid / WFH Options
Carnival Corporation & plc
to ensure required outcomes are achieved. Take ownership of specific activities, as directed by the Server & Storage Engineering Lead, including technical delivery, improvement initiatives, Incident and Problem investigation, and rootcauseanalysis This role is positioned at CUK09 level within our organisation and is available on a full-time, permanent basis. We offer hybrid work including up More ❯
South East London, England, United Kingdom Hybrid / WFH Options
Owen Thomas | Pending B Corp™
and efficiency. Automate configuration, provisioning, and deployment to reduce manual effort and streamline operations. Implement and uphold security standards, including encryption, access control, and compliance. Lead incident response and rootcauseanalysis, applying preventive measures to avoid recurrence. Collaborate across teams (QA, DevOps, IT) to troubleshoot and enhance system performance. Maintain clear documentation for configurations, procedures, and … with a focus on Python. Skilled in TDD and BDD, primarily using Python. Deep understanding of distributed systems, networking, storage, and compute management. Strong troubleshooting skills, with experience in rootcauseanalysis and timely resolution. Knowledge of security standards (ISO27001, NIST, GDPR) and infrastructure security best practices. Experienced with monitoring/logging tools like Splunk, Grafana, and More ❯