Newcastle Upon Tyne, Tyne And Wear, United Kingdom
Bede Gaming
finalise implementation plans for new work Support feature teams during execution with activities such as prototyping, pair coding, reference code, pull request reviews, and ensuring test coverage Engage in incidentresponse and diagnostics, particularly where player engagement services are affected Continuous Improvement & Team Development Maintain and oversee the quality of technical documentation produced by your stream Share knowledge More ❯
The role involves building frameworks for intelligent alerts to help Service Delivery teams quickly triage incidents and enable automated runbooks. Additionally, you will identify and deploy tools to automate incident detection, notifications, triage, and resolution. Key Responsibilities: Pipeline Approach: Adopt a pipeline approach to enable observability of services deployed across multiple environments, balancing monitoring, logging, and tracing based on … extending to trigger automated execution of runbooks with clear audit logs. Collaboration: Work closely with DevOps, Service Reliability, and Service Delivery teams to identify and deploy tools that automate incident detection, notifications, triage, and resolution. What We're Looking For: Skills: Leadership and Collaboration: Strong leadership skills with the ability to mentor, coach, and develop high-performing teams. Excellent … and Observability: Experience in creating and maintaining dashboards for proactive monitoring of services. Ability to design and build intelligent alerts using pipelines, enabling early detection of issues and automated incident response. Knowledge of the latest technology trends in the monitoring landscape, such as OpenTelemetry. Contract Management: Experience in managing third-party provider contracts, including negotiating terms, monitoring performance, and More ❯
Information Technology, Enterprise Resource Planning (ERP), and Engineering consulting, with the aim of becoming an internationally renowned Systems Integration Company. Job Description We are currently seeking an IT Major Incident/Problem Manager for a contract position based in Crawley, England. The role involves managing major incidents and problems, ensuring root causes are identified, and implementing process improvements. The … successful candidate will report to the IT Operations Manager and be responsible for coordinating incident responses, conducting RCA reports, and analyzing incident trends to prevent recurrence. Responsibilities Manage major incident and problem management processes across services, suppliers, and customers. Coordinate rapid response to incidents, minimizing system downtime. Provide technical skills and gap analysis to improve incident and problem management. Analyze incident data to propose resolutions and prevent future incidents. Requirements Excellent communication and organizational skills. Proven experience in Incident and Problem Management. Self-motivated with a focus on customer service. CRB Security Check clearance. Qualifications and Experience Knowledge of IT infrastructure components such as hardware, databases, and networks. Understanding of IT concepts and More ❯
Berkhamsted, Hertfordshire, United Kingdom Hybrid / WFH Options
Digital Preservation Coalition
cybersecurity tools to conduct proactive vulnerability scans across the Archive's network, devices, and systems; prioritize and address vulnerabilities; and generate progress reports. Monitor security events using detection and response solutions; respond to, manage, escalate, and report potential security incidents following established IncidentResponse Procedures. Lead cybersecurity analysis, improvements, monitoring, and incidentresponse efforts, collaborating More ❯
behind the curtain, ensuring our critical systems are always reliable, available, and performing like a dream . We're talking about implementing smart automation, sharp monitoring, and super-speedy incidentresponse strategies to keep everything running smoothly. You'll be working hand-in-hand with our dev, infra, and security teams, making sure we balance exciting new features … be the guardian of our uptime, making sure our critical systems are always available and hitting those all-important SLAs . You'll also be leading the charge on incident management , getting to the bottom of any issues and making sure we learn from them. Monitoring & Alerting Maestro: Setting up and maintaining top-notch monitoring systems (like Dynatrace ) will … craft alerting systems that give us a heads-up before problems even get a chance to impact our players, and you'll define key metrics to measure system health. IncidentResponse Ace: When things get a bit wobbly, you'll be on the front lines, resolving incidents fast to minimize downtime. After the dust settles, you'll lead More ❯
Collaborate with engineering teams to support platform reliability and enable delivery Maintain visibility and awareness through monitoring and logging tools such as Datadog, Azure Monitor, App Insights etc. Support incident resolution and participate in an on-call rota to help maintain service uptime Qualifications The Requirements: Essential Experience: Proven experience in a Platform, Infrastructure, or DevOps engineering role Hands … tools such as ArgoCD or Flux Familiarity with Configuration as Code tools like Ansible or Puppet Exposure to large-scale distributed systems or high-volume web APIs Awareness of incidentresponse processes and platform reliability best practices Equal Opportunity Employer At WTW, we believe difference makes us stronger. We want our workforce to reflect the different and varied More ❯
West Midlands, United Kingdom Hybrid / WFH Options
Kind Consultancy Limited
enable the Head of Supply Chain and Procurement to design and embed a supplier/procurement risk management framework. Operating as a technical legal expert for roles tasked with incidentresponse planning, business continuity, operational resilience, cyber incident scenario simulations, overseeing framework materials to ensure they are current and responsive to changing risk scenarios and regulatory appetite. More ❯
Gloucestershire, South West, United Kingdom Hybrid / WFH Options
Data Careers
/3 days per week and WFH) Salary : £35,526 + Excellent pensions and other Employee Benefits Key Skills: 1st/2nd line Systems/Applications support, ITIL awareness, incidentresponse/responding to tickets, excellent customer skills, basic knowledge of MS modern management tools such as SQL server Admin, SCCM/Intune etc. Why Apply: These are More ❯
resolutions are within SLA. Build and nurture strong relationships both internally and externally to enhance service delivery for our customers. Complete and document Root Cause Analyses (RCAs) and Post Incident Reviews (PIRs), recommending improvements where necessary. Contribute to ITSM-driven initiatives, collaborating as a chapter to implement positive changes. Create and maintain Knowledge Base articles for team sustainability and … API testing tools Experience in unit testing with a focus on continual improvement in API monitoring and performance A mindset geared towards optimisation and automation, especially in alerting and incidentresponse processes Strong documentation skills to ensure key processes and learnings are shared across the team Solid understanding of ITIL v4 (certification required) Exposure to Agile methodologies A More ❯
Manchester, Lancashire, England, United Kingdom Hybrid / WFH Options
Pontoon
resolutions are within SLA. Build and nurture strong relationships both internally and externally to enhance service delivery for our customers. Complete and document Root Cause Analyses (RCAs) and Post Incident Reviews (PIRs), recommending improvements where necessary. Contribute to ITSM-driven initiatives, collaborating as a chapter to implement positive changes. Create and maintain Knowledge Base articles for team sustainability and … API testing tools Experience in unit testing with a focus on continual improvement in API monitoring and performance A mindset geared towards optimisation and automation, especially in alerting and incidentresponse processes Strong documentation skills to ensure key processes and learnings are shared across the team Solid understanding of ITIL v4 (certification required) Exposure to Agile methodologies A More ❯
South West London, London, England, United Kingdom
Oscar Technology
support cloud-native infrastructure evolution Build and optimise CI/CD pipelines (GitHub Actions, Azure DevOps, Jenkins) Implement robust monitoring and alerting solutions (CloudWatch, Azure Monitor, Grafana, ELK) Own incidentresponse processes, ensuring high availability and rapid resolution Collaborate with stakeholders to communicate solutions and technical trade-offs clearly Ideal Experience: 3-5 years SRE or DevOps experience More ❯
Leeds, Yorkshire, United Kingdom Hybrid / WFH Options
William Hill PLC
through software delivery pipelines using Infrastructure as Code Maintaining and Developing Infrastructure - Curate the Container Orchestration, Monitoring, Messaging and Data Storage Platforms, developing any necessary integration Supporting Incidents - Assist Incident Management in Production all the way through impact assessment, service restoration and post-mortems, including being part of the SRE on call rotation Sharing Knowledge - Enabling development teams within … gambling, and we are looking for people who can support our ethos. To apply to this post, you will have: A base in Leeds with working experience of an incidentresponse model and fluency with observability and monitoring (Prometheus, Grafana) Experience defining alerts and implementing dashboards from existing monitoring and logging data Relentless focus on customer experience with More ❯
for-purpose Cyber Resilience Framework embedded across the business. Work closely with Governance, Risk & Compliance (GRC) teams and run the workstream responsible for outlining and validating disaster recovery and incidentresponse plans. Drive cross-functional collaboration with technology, legal, data privacy, crisis management, disaster recovery, and operational continuity teams. Deliver and maintain practical recovery processes across a complex More ❯
Edinburgh, Midlothian, Scotland, United Kingdom Hybrid / WFH Options
G2 Legal Limited
team, you will: Lead complex, multi-jurisdictional advisory and contract work Advise on UK GDPR/EU GDPR, international data transfers, PECR, AI/data ethics, cyber regulation and incidentresponse Prepare, negotiate and review a broad range of commercial and tech-related contracts Drive delivery for strategic client projects and compliance programmes Work cross-functionally with disputes More ❯
Lincoln, Lincolnshire, East Midlands, United Kingdom Hybrid / WFH Options
ITSS Recruitment Ltd
technical solutions and resource plans. * Serve as the technical voice in executive discussions and strategic planning. * Ensure all systems and software meet internal standards and external compliance requirements. * Oversee incidentresponse, vulnerability management, and disaster recovery plans. As a visionary and strategic technology leader, the Director of Software Engineering is responsible for shaping and executing the software development More ❯
both written and spoken Demonstrable experience as a Security Architect or similar role Strong knowledge of security standards, protocols, and best practices Experience with threat modelling, risk assessment, and incidentresponse Familiarity with security tools (e.g., Snyk, OWASP ZAP) Excellent communication and collaboration skills Self-learner and ability to execute tasks without supervision Ability to maintain the highest More ❯
maintain high-quality, clean, and testable code. Implement and uphold automated testing practices (unit, integration, E2E, load, and penetration testing). Ensure production environments run smoothly and assist in incidentresponse when needed (including out-of-hours support on occasion). Contribute to platform architecture decisions and standardisation efforts across teams. Work closely with the Principal Software Engineer More ❯
maintain high-quality, clean, and testable code. Implement and uphold automated testing practices (unit, integration, E2E, load, and penetration testing). Ensure production environments run smoothly and assist in incidentresponse when needed (including out-of-hours support on occasion). Contribute to platform architecture decisions and standardisation efforts across teams. Work closely with the Principal Software Engineer More ❯
Newcastle Upon Tyne, Tyne And Wear, United Kingdom
Bede Gaming
leads to finalise solution designs and guide teams through delivery. Provide hands-on support during build phases-pair-programming, prototyping, reviewing code, and ensuring test coverage is solid. Support incidentresponse when needed, partnering with Service Delivery to resolve critical issues efficiently. Knowledge Sharing & Documentation Build deep expertise in our BI products and share that knowledge across the More ❯
Leichester, Leicester, Leicestershire, United Kingdom
Vacancy Filler (Integration)
and secure systems. In this role, you will collaborate closely with development, QA, and IT teams to streamline CI/CD pipelines, automate infrastructure, and ensure efficient monitoring and incident response. Ideal candidates have strong experience with cloud platforms (e.g., AWS, Azure, or GCP), containerization (e.g., Docker, Kubernetes), and infrastructure-as-code tools (e.g., Terraform, Ansible). More ❯
is reliable, scalable, and secure. Ensure the reliability, availability, and scalability of the systems, platforms, and technology through the application of software engineering techniques, automation, and best practices in incident response. Accountabilities Build Engineering: Development, delivery, and maintenance of high-quality infrastructure solutions to fulfil business requirements ensuring measurable reliability, performance, availability, and ease of use. Including the identification … of the appropriate technologies and solutions to meet business, optimisation, and resourcing requirements. Incident Management: Monitoring of IT infrastructure and system performance to measure, identify, address, and resolve any potential issues, vulnerabilities, or outages. Use of data to drive down mean time to resolution. Automation: Development and implementation of automated tasks and processes to improve efficiency and reduce manual More ❯
track down the root cause. Communicate the impact of the problem to stakeholders in terms of business value, helping to set a priority for the resolution. Actively participate in incident responses. Engineering standards & frameworks - Maintain knowledge of Xero's current and emerging engineering standards and practices. Develop and deploy software that meets Xero's standards. Continuous improvement - Maintain knowledge More ❯
Guildford, Surrey, United Kingdom Hybrid / WFH Options
EURAXESS Czech Republic
skills and experience Full range of system administration skills including user management, building/deployment, installing scientific software packages, performance benchmarking, resource utilisation/performance and availability monitoring and incident response. automation of repetitive tasks in the form of developing and maintaining Ansible playbooks and roles using git for version control, change management and collaboration. Pro-actively and reactively More ❯