implement observability solutions that provide real-time insight into the health, performance, and reliability of Anglian Water's digital platforms and products. Your work will enable proactive incident detection, rootcauseanalysis, and continuous improvement, embedding observability as a core engineering discipline across our organisation. What will you be doing as an Observability Engineer? Design and implement … aligned to user and business needs. Integrate observability tooling into CI/CD pipelines and infrastructure-as-code. Standardise tooling across teams and support automation of alert responses and rootcause analysis. Collaborate with development, operations, and platform teams to define SLIs, SLOs, and error budgets. Conduct rootcauseanalysis and post-incident reviews to More ❯
Deploy, configure, and optimize Wiz for continuous cloud security monitoring and compliance management. Identify vulnerabilities, misconfigurations, and risks across AWS, Azure, and GCP environments, and drive remediation efforts. Lead rootcauseanalysis (RCA) for security incidents and coordinate escalations as needed. Partner with software engineering and infrastructure teams to integrate security best practices into CI/CD … engineering using Wiz , AWS , Azure , and GCP . Strong understanding of cybersecurity principles , risk and controls , and internal control frameworks . Proficiency in incident response , security issue escalation , and rootcauseanalysis . Hands-on experience with security automation , DevSecOps tools , and infrastructure as code (e.g., Terraform, CloudFormation). Excellent problem-solving skills and ability to think More ❯
processes better. What You’ll Be Doing Owning and coordinating day-to-day application support Becoming the go-to expert for core business applications (high level, not coding) Driving root-causeanalysis to eliminate recurring issues Raising and presenting improvement initiatives for approval Supporting and coordinating the Change Advisory Board (CAB) Working closely with projects to minimise … days on-site per week) Permanent Up to £37,000 + £2,000 performance bonus What We’re Looking For Must-have: Background in application support Someone who enjoys root-causeanalysis and problem ownership Confident communicating across teams and taking the lead Nice-to-have: ITIL exposure or certification Experience with Azure DevOps (or similar) Why More ❯
Partner Management: Manage relationships with key infrastructure and cloud service providers (e.g., Microsoft, AWS, VMware, hardware vendors), including license, support, and performance management. Incident & Problem Management: Oversee critical incidents, rootcauseanalysis, and service restoration activities, ensuring timely communication and resolution. Change & Release Oversight: Provide technical validation and risk assessment for infrastructure and platform changes. Continuous Improvement … value and accountability. Technical Depth: Strong expertise in hybrid cloud architectures, automation, virtualisation, storage, networking, and security integration. Problem Solving: Analytical and pragmatic in resolving service issues and driving rootcause elimination. Communication: Clear and confident communicator with both technical and non-technical stakeholders. Continuous Improvement Mindset: Looks for opportunities to enhance efficiency, cost control, and service quality More ❯
and shape the future of Problem Management at scale. What You’ll Do: • Take full ownership of the problem lifecycle, from identification to resolution, preventing recurring service issues.• Lead RootCauseAnalysis (RCA) and trend analysis to uncover systemic problems and deliver actionable solutions.• Maintain and optimise the Known Error Database (KEDB) and ensure key information … excellence.• Exceptional analytical, problem-solving, and critical thinking skills.• Confident stakeholder management and influencing skills, able to engage both technical and non-technical audiences.• Experience with RCA methodologies, trend analysis, and embedding CSI initiatives.• Background in utilities, energy, or critical national infrastructure is a strong advantage.• Resilient, proactive, and thrives in a fast-paced, complex environment. Join a team More ❯
and shape the future of Problem Management at scale. What You’ll Do: • Take full ownership of the problem lifecycle, from identification to resolution, preventing recurring service issues. • Lead RootCauseAnalysis (RCA) and trend analysis to uncover systemic problems and deliver actionable solutions. • Maintain and optimise the Known Error Database (KEDB) and ensure key information … excellence. • Exceptional analytical, problem-solving, and critical thinking skills. • Confident stakeholder management and influencing skills, able to engage both technical and non-technical audiences. • Experience with RCA methodologies, trend analysis, and embedding CSI initiatives. • Background in utilities, energy, or critical national infrastructure is a strong advantage. • Resilient, proactive, and thrives in a fast-paced, complex environment. Join a team More ❯
Sheffield, Yorkshire, United Kingdom Hybrid/Remote Options
N Consulting Limited
across GCP, ensuring resilience, cost-efficiency, and data security. • Collaborate closely with infrastructure, architecture, and cybersecurity teams to meet internal risk, compliance, and governance requirements. • Support live systems, perform rootcauseanalysis, and implement solutions for incidents and performance bottlenecks. Qualifications and experience: The ideal candidate for this role will have the below experience and qualifications: • Bachelor More ❯
the Team Lead, other engineers, and the wider Cloud Services Group to ensure consistent service delivery and technical excellence. You ll take ownership of complex incidents and requests, drive rootcauseanalysis, and help improve the quality and reliability of managed services. The role supports the Team Lead by taking ownership of escalations, service improvement initiatives, and More ❯
England, Beckwith, North Yorkshire, United Kingdom Hybrid/Remote Options
The Bridge IT Recruitment
the Team Lead, other engineers, and the wider Cloud Services Group to ensure consistent service delivery and technical excellence. You’ll take ownership of complex incidents and requests, drive rootcauseanalysis, and help improve the quality and reliability of managed services. The role supports the Team Lead by taking ownership of escalations, service improvement initiatives, and More ❯
with cross-functional teams to define, design, and deliver high-quality solutions. (Optional) Contribute to UI development using React.js or Blazor for customer-facing applications. Troubleshoot production issues, perform rootcauseanalysis, and drive continuous improvements. More ❯
leeds, west yorkshire, yorkshire and the humber, united kingdom Hybrid/Remote Options
Pharmacy2U
related tools. Driving backup, disaster recovery and high-availability strategies across critical workloads. Provide authoritative technical guidance to Service Operations, Cyber Security and wider Technology stakeholders. Leading incident investigations, root-causeanalysis, and proactive remediation. Liaising with third-party partners and vendors to ensure reliable and secure service delivery. Who are we looking for? Self-motivated, confident More ❯
Wigan, Lancashire, England, United Kingdom Hybrid/Remote Options
Searchability
Hands-on experience with AWS or another major cloud platform Knowledge of Kubernetes, Terraform, and Infrastructure as Code Strong networking fundamentals and distributed systems knowledge Proven incident response and rootcauseanalysis experience Excellent collaboration and communication skills TO BE CONSIDERED: Please either apply through this advert or emailing me directly via . For further information please More ❯
Wigan, Greater Manchester, United Kingdom Hybrid/Remote Options
Searchability (UK) Ltd
Hands-on experience with AWS or another major cloud platform Knowledge of Kubernetes, Terraform, and Infrastructure as Code Strong networking fundamentals and distributed systems knowledge Proven incident response and rootcauseanalysis experience Excellent collaboration and communication skills TO BE CONSIDERED: Please either apply through this advert or emailing me directly via (url removed). For further More ❯
Automate recurring administrative tasks using Scripting languages and configuration management tools. Ensure all OpenShift environments adhere to required security, regulatory, and operational standards. Provide support for incidents, troubleshooting, and rootcauseanalysis related to OpenShift platforms. Required Skills/Experience: Proven experience in installing, configuring, and administering OpenShift clusters. Strong understanding of Kubernetes, container orchestration, and Linux More ❯
configurations comply with the bank's security standards and regulatory frameworks. Manage encryption key lifecycle operations (generation, rotation, revocation). Respond to and resolve IKP-related incidents promptly. Conduct rootcauseanalysis and implement preventive measures. Work closely with IKP SMEs, Build Engineers, OpenShift teams, and ITSO for secure integration. Support audits and compliance reviews. Maintain operational More ❯
of AWS, Terraform, and Ansible. Technical Skills Linux system administration & shell scripting. Networking fundamentals, containerization, and infrastructure security best practices. Version control experience (e.g., Git). Strong troubleshooting and rootcauseanalysis skills. Desirable Skills Experience with Kubernetes and/or other cloud platforms. Familiarity with Nagios, Datadog, or similar monitoring tools. Exposure to CI/CD More ❯
their fulfilment status. · Ensure that all service requests comply with established policies and procedures. Problem Management: · Assist with the identification and resolution of recurring incidents and problems. · Contribute to rootcauseanalysis and the development of permanent solutions. · Maintain documentation of known issues and solutions for future reference. Customer Service: · Provide excellent customer service to all end More ❯
the determination of equipment criticality across all instrumentation asset classes and business functions, acting as the technical authority for whole-life asset care strategies. Apply Failure Mode and Effects Analysis (FMEA) to develop predictive and preventative maintenance strategies for critical assets, and to propose efficient spare parts holding strategies. Use field feedback and performance data to conduct regular Preventative … Maintenance Optimisation (PMO), ensuring maintenance activities remain effective and efficient. Perform detailed reliability, asset health, and performance analysis—prioritising high-risk and high-cost assets—to initiate strategy reviews and quantify improvements such as Mean Time Between Failures (MTBF), cost savings, and resource efficiency. Facilitate defect elimination studies and solutions, by collating and analysing submitted rootcauseanalysis (RCA) conclusions, carried out locally by Maintenance Specialists, and using the companies ‘bad actor’ report to identify repeat cause assets which require investigation. Provide technical support and content for job plans, work instructions, and internal training courses; collaborate with operational and maintenance teams to improve reliability of operational equipment, and compliance with statutory and regulatory requirements More ❯
bank's security standards and regulatory frameworks. Manage encryption key life cycle operations (generation, rotation, and revocation). Incident Management: Respond to and resolve IKP-related incidents promptly. Conduct rootcauseanalysis and implement preventive measures. Collaboration: Work closely with IKP SMEs, build engineers, OpenShift teams, and ITSO for secure integration. Support audits and compliance reviews. Documentation More ❯
of collaboration, innovation, and continuous improvement. Assist with the design, implementation, and maintenance of systems to ensure high availability, scalability, and performance. Develop and implement strategies for incident response, rootcauseanalysis, and post-mortem reviews to prevent future incidents. Work closely with business and technology teams to understand their needs and ensure alignment with reliability and More ❯
Employment Type: Permanent
Salary: £90000 - £100000/annum To £140,000 package
of collaboration, innovation, and continuous improvement. Assist with the design, implementation, and maintenance of systems to ensure high availability, scalability, and performance. Develop and implement strategies for incident response, rootcauseanalysis, and post-mortem reviews to prevent future incidents. Work closely with business and technology teams to understand their needs and ensure alignment with reliability and More ❯
investigation through to work around or resolution, always keeping the customer informed of progress. Investigate, analyse and problem solve problems using SQL and other diagnostic tools and methodologies. Conduct rootcauseanalysis on identified problems and investigate potential permanent fixes. Provide input to knowledge base documentation based on support ticket trends and issues. Make recommendations for best More ❯
Manchester, Lancashire, United Kingdom Hybrid/Remote Options
Smart DCC
refreshes, ensuring new solutions are secure by design and aligned with DCC's technology strategy. Incident & Problem Response: Lead the Technology Office representation in post incident reviews, ensuring credible rootcauseanalysis (RCA) and delivery of corrective actions. Cross Domain Collaboration: Drive alignment and coherence across domain architectures to ensure performance, security, and operational integrity. Compliance & Governance More ❯
with other members of the team or directly with business users to understand and document business requirements, Undertake/support the monitoring of BAU processes as directed, including undertaking rootcauseanalysis, advising remediation options and if required delivering a solution including delivering any early lifecycle support as needed. Ensure that all work is carried through the More ❯
Employment Type: Permanent
Salary: £70000 - £80000/annum Hybrid working 10% pension
Didsbury, Manchester, Lancashire, England, United Kingdom
Great Places Housing Association
and the project management team to deliver new applications and system projects. Work with colleagues, engineers and consultants to resolve complex on going technical issues and provide input into rootcauseanalysis reports as requested. Assist with the development and implementation of disaster recovery (DR) plans including testing and documentation of the network infrastructure. Familiarise yourself with More ❯