Swansea Vale, Swansea, West Glamorgan, Wales, United Kingdom Hybrid / WFH Options
ERS
Major Incident & Problem Manager Grade: 4 Reporting to: Head of IT Service Management Location: Swansea About us IQUW is a speciality (re)insurer at Lloyd’s (Syndicate 1856) underwriting a diverse range of Property, Commercial and Speciality (re)insurance products from Cargo and Marine to Political Violence, Terror and War. We combine data, intelligent automation and human expertise … help get under the skin of the most difficult insurance risks, helping build products to meet their customer’s needs. The role We are seeking a proactive and experienced Incident and Problem Manager to take ownership of our Incident, Major Incident and Problem Management processes. This role is critical in reducing operational disruption, improving service reliability … and driving continuous improvement across IT services. The ideal candidate will have a strong process ownership and communication mindset, ensuring that incidents are managed effectively while also implementing problem management strategies to prevent recurrence. We currently operate a hybrid working model. This entails 3 days per week collaborating with colleagues in the office, and 2 days working from home. More ❯
Cardiff, South Glamorgan, United Kingdom Hybrid / WFH Options
Monzo
decisions and careful controls that distinguish us in both banking and technology. Across all of the business areas you support, you will lead the way in real-time risk management by ensuring key risk indicators are well-defined and understood, incidents are managed, MI and reporting packs are created, and risk & control assessments are completed. We are looking for … troubleshooting to help resolve incidents, issues and/or regulatory requests quickly and within the correct boundaries You can demonstrate experience in leading Risk & Control Assessments, Control Development, Assurance, IncidentManagement and Issue & Action Management You have experience in data and technology risk areas including a good understanding of GDPR, PECR and evolving considerations for AI and More ❯
ll design scalable infrastructure, automate operations, and embed SRE principles to improve reliability and reduce toil. This is a highly influential role where you'll guide engineering standards, support incidentmanagement, and mentor others in building robust, cloud-native systems using modern DevOps practices. What You'll Bring: Strong experience supporting complex web applications and distributed systems, including … DevOps, GitHub Actions) Solid grasp of cloud infrastructure (Azure or GCP), networking, and security best practices for web platforms Knowledge of SRE frameworks including SLOs, SLIs, error budgets, and incident response Familiarity with testing tools such as Playwright, Vitest, and Jest Understanding of infrastructure-as-code (Terraform) and DevSecOps is a plus Why You Should Apply: You'll join More ❯
intelligence and service assurance. You will be responsible for designing, implementing, and supporting monitoring solutions across a range of technologies and platforms, ensuring service stability, performance insight, and proactive incident management. Key Responsibilities * Translate high-level monitoring non-functional requirements (NFRs) into actionable configurations across tools such as Splunk, Dynatrace, and AppDynamics. * Deliver full-stack observability solutions, including application … aware network performance monitoring (NPM), synthetics, log analytics, and infrastructure metrics. * Provide live support for monitoring technologies and assist with live service support, including key business events and incident response (some KBE's may be out of hours). * Collaborate with architects and project teams to integrate monitoring into solution designs and test strategies. * Maintain and enhance dashboards, alerts More ❯