London, England, United Kingdom Hybrid / WFH Options
Howden Group Holdings
General responsibilities: Clients: Develop strong relationships, support IT system use, and provide excellent customer service. Administration: Maintain accurate records and follow policies. Support: Provide escalation support, update documentation, conduct rootcauseanalysis, and manage incidents. Compliance: Ensure adherence to policies, legal, and regulatory requirements, and maintain proper records. What we offer: A career aligned with your values More ❯
Salisbury, Wiltshire, United Kingdom Hybrid / WFH Options
Sopra Steria Group
not limited to Cisco Routing, Switching, Security, SDN, Unified Communications and Wireless technologies). Identify and explore opportunities for enhancing efficiency, leveraging orchestration technologies to streamline and automate. Lead 'RootCauseAnalysis' investigations into network faults, security and performance issues. Support the Principal NetOps Engineer and Architects with project implementation. Liaise with third party service providers for More ❯
Portsmouth, Hampshire, South East, United Kingdom Hybrid / WFH Options
Sopra Steria Limited
not limited to Cisco Routing, Switching, Security, SDN, Unified Communications and Wireless technologies). Identify and explore opportunities for enhancing efficiency, leveraging orchestration technologies to streamline and automate. Lead 'RootCauseAnalysis' investigations into network faults, security and performance issues. Support the Principal NetOps Engineer and Architects with project implementation. Liaise with third party service providers for More ❯
Reigate, England, United Kingdom Hybrid / WFH Options
Client Server
Other responsibilities will encompass, proactive monitoring of production environments, design and implementation of automation and processes to improve efficiency and effectiveness, taking a lead in incident response, troubleshooting and rootcauseanalysis activities to mitigate future issues. You'll collaborate with senior business stakeholders to gather requirements, address concerns and provide updates on projects and systems status More ❯
London, England, United Kingdom Hybrid / WFH Options
Keyrock
stack to enhance system reliability. Security & Compliance: Apply best practices for cloud security, IAM policies, and compliance standards (SOC2, ISO 27001). Incident Response & Performance Optimization: Troubleshoot issues, perform rootcauseanalysis, and optimize system performance. Infrastructure as Code (IaC): Use Terraform, Ansible, or similar tools for automation. Collaboration & Knowledge Sharing: Work closely with development, security, and More ❯
Birmingham, England, United Kingdom Hybrid / WFH Options
Free-Work UK
System Management: Lead daily operations, maintenance, and upgrades of critical business systems, ensuring 99%+ availability and optimal performance. Incident Management: Oversee incident processes (not hands-on), including communication, rootcauseanalysis, corrective actions, and service restoration. Vendor Management: Manage third-party suppliers and partners to meet SLAs, drive system improvements, and conduct service reviews. SLA Implementation More ❯
Glasgow, Scotland, United Kingdom Hybrid / WFH Options
Integrated Environmental Solutions Ltd
growing staff in North America and will hold core hours to facilitate that requirement. Investigate and resolve problems in systems and services to minimise downtime, and carry out subsequent rootcauseanalysis Provide technical input to IT projects where appropriate, including collaboration in design, implementation, change and support handover Collaborating with other Infrastructure and Departmental teams to More ❯
Kettering, Northamptonshire, East Midlands, United Kingdom Hybrid / WFH Options
E.surv Limited
effective escalation path for the Junior IT Infrastructure and Desktop Support team members. Ensuring IT assets are maintained in the IT Asset register, as per requirement Lead in the rootcauseanalysis and resolution of Problems. To play an active role in identifying improvement opportunities, taking ownership of implementation, and ensuring that these are communicated to stakeholders. More ❯
Catonsville, Maryland, United States Hybrid / WFH Options
RELI Group, Inc
to support audit, compliance, and FISMA reporting requirements. Partner with DevOps teams to integrate data pipelines into CI/CD workflows using GitHub, Databricks Repos, and related tools. Perform rootcauseanalysis and resolution of data issues across staging, integration, and production environments. Qualifications Bachelor's degree in Computer Science, Information Systems, Engineering, or related technical field More ❯
London, England, United Kingdom Hybrid / WFH Options
Deutsche Bank
volunteering leave per year Your key responsibilities Acting as the first line of defence for app trading systems, resolving production incidents promptly to minimise downtime and financial impact Conducting rootcauseanalysis and blameless post mortems, implementing or driving the solutioning of preventative measures for recurring issues Providing clear communications to business and senior technologists on incidents More ❯
with Developers, DevOps, QA, and Product teams in Agile ceremonies and story refinements. Provide early testability feedback and contribute to design and acceptance criteria definition. Participate in defect triage, rootcauseanalysis, and ensure timely issue resolution. Contribute to release readiness activities, ensuring automation health and reliability. Innovation & Continuous Improvement Implement and monitor automation KPIs (e.g., test More ❯
features. Authors and maintains comprehensive technical documentation including detailed system configurations, governance models, and operational procedures. Acts as a senior escalation point for Level 3/4 support, performing rootcauseanalysis and driving long-term resolution of complex issues. Manages the technical scope, delivery timelines, and risk mitigation strategies for cloud engineering initiatives. Tracks and reports More ❯
London, England, United Kingdom Hybrid / WFH Options
Womble Bond Dickinson (UK) LLP
features. Authors and maintains comprehensive technical documentation including detailed system configurations, governance models, and operational procedures. Acts as a senior escalation point for Level 3/4 support, performing rootcauseanalysis and driving long-term resolution of complex issues. Manages the technical scope, delivery timelines, and risk mitigation strategies for cloud engineering initiatives. Tracks and reports More ❯
London, England, United Kingdom Hybrid / WFH Options
Amazon
AWS Support customers, as well as internal stakeholders Work to improve important metrics such as ‘mean time to engagement’ and ‘mean time to communication’ for all incident types Facilitate RootCauseAnalysis and Post Event Reviews after each event to minimise recurrence Work with key stakeholders across AWS as advocates on behalf of customers to drive improvements More ❯
Livingston, Scotland, United Kingdom Hybrid / WFH Options
Sky
Analysts and QAs to ensure process designs are clearly understood and implemented. Identify improvements to the team’s internal processes and upholding standards including peer review. Incident management, including rootcauseanalysis, clean-up and prevention of future occurrences. Contribute to Sky’s RPA strategy and assessing new automation opportunities. Expand and maintaining our RPA infrastructure. What More ❯
London, England, United Kingdom Hybrid / WFH Options
Citigroup Inc
managers, ensuring outstanding issues are tracked to closure, particularly long-term strategic fixes. Performs controlled resolution of incidents and problems including prioritization and escalation to relevant groups when appropriate, rootcauseanalysis of all problems with follow-through to resolution. Consults with the primary clients of the application in conjunction with development managers in order to understand More ❯
London, England, United Kingdom Hybrid / WFH Options
OpenAsset - Axomic Ltd
Mentor team members through code reviews, coaching, and fostering a culture of growth and learning. Strategic Problem Solving Identify and resolve infrastructure bottlenecks, scalability challenges, and performance issues. Drive rootcauseanalysis and long-term solutions for complex production incidents. Security & Compliance Implement and enforce security best practices across systems, environments, and pipelines. Ensure compliance with industry More ❯
SRE Center of Excellence Manage cross-functional requirements working with Engineering, Product, Services, and other departments Be a mentor of quality for design reviews, code, test cases, automation, observability, rootcauseanalysis, and self-healing Influence architectural design, implementation, consolidation, and simplification for global scale Focuses on expanding own skills and looking at improving their teammates' skills More ❯
San Diego, California, United States Hybrid / WFH Options
Qualcomm
requests through a help desk ticketing system, ensuring timely communication, accurate documentation, and high-quality customer service. Troubleshoot and debug moderately complex issues across networks, systems, and applications; perform rootcauseanalysis and escalate when necessary. Support IT security practices by applying updates, monitoring for vulnerabilities, and assisting with compliance-related tasks. Monitor system and application performance More ❯
SRE Center of Excellence Manage cross-functional requirements working with Engineering, Product, Services, and other departments Be a mentor of quality for design reviews, code, test cases, automation, observability, rootcauseanalysis, and self-healing Influence architectural design, implementation, consolidation, and simplification for global scale Focuses on expanding own skills and looking at improving their teammates' skills More ❯
London, England, United Kingdom Hybrid / WFH Options
Womble Bond Dickinson (UK) LLP
features. Authors and maintains comprehensive technical documentation including detailed system configurations, governance models, and operational procedures. Acts as a senior escalation point for Level 3/4 support, performing rootcauseanalysis and driving long-term resolution of complex issues. Manages the technical scope, delivery timelines, and risk mitigation strategies for cloud engineering initiatives. Tracks and reports More ❯
London, England, United Kingdom Hybrid / WFH Options
GiveDirectly
data) to understand real-world needs and ship tools that directly support program delivery in the field. Debug and resolve production issues across our stack, with a focus on rootcauseanalysis and long-term fixes. Advocate for sustainable engineering practices, including testing, documentation, and monitoring Help shape our tech roadmap with an eye toward scale, maintainability More ❯
Caerphilly, Wales, United Kingdom Hybrid / WFH Options
Sadler Recruitment
technical support whilst working to high standards in the ITIL areas of Incident, Problem, Change and Service Request. Working as a technical lead on major incidents, problem management and rootcause analysis. Managing and optimising enterprise end-user computer environments, including Windows, MacOS, mobile devices, VDIs and collaboration tools. Managing and supporting Microsoft 365, Intune, Active Directory and More ❯
Raleigh, North Carolina, United States Hybrid / WFH Options
Hyperdrive Recruiting
monitoring tools. Define and track Service Level Indicators (SLIs) and Service Level Objectives (SLOs) for system performance benchmarks. Respond to escalated incidents, troubleshoot system and application problems, and conduct rootcause analyses. Stay updated with industry trends and emerging technologies to increase the quality and velocity of development. Lead the design and architecture of scalable, distributed, and fault More ❯
Manchester, England, United Kingdom Hybrid / WFH Options
Playtech
will... Take ownership of escalated customer support issues from our Service Desk, investigate incidents and provide resolution within contractually specified SLAs. Inform customers of all actions taken, and document rootcause analysis. Maintain incident tickets in Jira Service Management, ensuring actions, solutions and ticket status are accurate and updated. Reproduce complex incidents while retrieving log files, working in More ❯