a global basis, the resilience of operations has become a board level issue. You will provide our clients with a full spectrum of services, covering proactive and reactive Cyber Incident Response (CIR) Services. The proactive arm of our business covers a breadth of propositions, including playbook development, wargaming, readiness assessments, post-breach assessments, managed threat hunting as well as … uplift their maturity and fundamentally enhance their preparedness to respond, via targeted capability uplift, C-Suite awareness campaigns and training. Our technical response team support our clients in live incident responses by working to identify root causes and evict threats. Our professionals apply their experience and imagination to find the most advanced threats, hiding in the darkest corners of … award-winning vendor relationships, we can do whatever it takes - from improving the security of a single component to delivering a holistic security and privacy program. As a Cyber Incident Response Advisory and IncidentManagement Senior Manager or Associate Director, you will focus on developing our business across both proactive and reactive services, whilst leading our advisory More ❯
The Head of Incident & Problem is a key role within the D&T Service Management organisation, to elevate the service experience for Colleagues and Stores by optimising the Incident and Problem practices. This role is instrumental in continuously improving operational stability and is accountable for the performance and continuous improvement of the practices. As Practice Owner you … will define the strategic roadmap for each practice, focusing on Incident reduction opportunities, Incident prevention via root cause analysis, accelerating resolution, improving operational efficiency, communication and customer satisfaction. What You'll do Your key accountabilities will include: Develop and define the strategic roadmap for Incident and Problem Management practices, focusing on incident reduction, incident prevention, reduced repeat incidents, increased availability, efficiency and accelerated resolution. Lead a team of Incident and Problem Performance Managers to continuously improve and optimise the effectiveness of the Incident and Problem Practices. Lead a Major IncidentManagement Lead to ensure that the 24/7 Major IncidentManagement Team is highly effective. Manage More ❯
Manchester Area, United Kingdom Hybrid/Remote Options
Hamilton Barnes 🌳
We are working with a leading Managed Services Provider seeking a proactive individual to join their Operational Support Centre in Manchester. This key hybrid position focuses on major incidentmanagement (60-70%) and team leadership (30-40%) - you will be pivotal in overseeing critical customer incidents, guiding a team of service desk engineers, and shaping the culture of … an evolving service organisation. Key Details: Job Title: Major Incident Manager Location: Manchester Office Salary: Up to £40,000 (Dependent on Experience) Set up: Hybrid - 3 days in the office What's in it for you? Responsibility across major incidents that directly influence customer satisfaction Leadership and development opportunities to manage and mentor a growing team, shaping a team … weeks, with an impressively low call our rate – with a £250 bonus payment for each week , you will boost your total annual earnings by £3,300. Key Responsibilities: Incidentmanagement: Act as the Major Incident Manager for all Priority 1 and Priority 2 incidents, leading coordination of response Communicate clearly and confidently with customers, executives, and internal More ❯
with the junior SRE to develop their practical experience and technical confidence. Partner with developers, data scientists, and business users to resolve technical issues. Automate & Optimise Contribute to configuration management and automation improvements. Identify and document standard operating procedures. Implement proactive monitoring measures to detect and prevent issues. Monitor & Troubleshoot Troubleshoot system issues using logs, monitoring tools, and a … methodical approach. Oversee and enhance system monitoring with Nagios, with a transition to Datadog. IncidentManagement Support incidentmanagement processes, including post-mortems and follow-up actions. Communicate outcomes with customers clearly and effectively. What We’re Looking For: Experience Proven experience in an SRE, DevOps, or Operations Engineering role. Strong working knowledge of AWS, Terraform More ❯
workload, and tasks are appropriately and fairly distributed throughout the team, with emphasis on making sure that staff are adequately skilled to respond and to monitor progression. Ensuring effective IncidentManagement is in place by monitoring against quality standards and checks. Identifying, escalating, and communicating in a timely manner high priority incidents, service requests, trends, and service issues … when dealing with demanding situations, taking ownership if required. Observe and influence the development of customer care skills within the team, leading by example and training as required. Line management of the team including day to day performance, conduct, and absence. Maintain a good personal awareness of industry standards and future developments. Promote good and efficient working practices in … organisational, NHS and legislative requirements and guidelines including IT Infrastructure Library (ITIL), General Data Protection Regulation (2016), Information Standards, Information Security, and compliance with NHS Information Governance. Ensure change management is applied within the Tier 2 team and follows implemented policies and procedures. Ensure all documentation relating to own area is complete and fit for purpose and all releases More ❯
Wigan, Lancashire, England, United Kingdom Hybrid/Remote Options
Searchability
develop automation to improve uptime, and refine observability to provide real-time insight into platform health. You'll also play a key role in performance testing, system tuning and incidentmanagement to ensure smooth operation during critical events. SITE RELIABILITY ENGINEER ESSENTIAL SKILLS At least 2 years' experience working as an SRE Deep understanding of system reliability, scalability … automation and debugging Hands-on experience with AWS or another major cloud platform Knowledge of Kubernetes, Terraform, and Infrastructure as Code Strong networking fundamentals and distributed systems knowledge Proven incident response and root cause analysis experience Excellent collaboration and communication skills TO BE CONSIDERED: Please either apply through this advert or emailing me directly via . For further information … conjunction with this vacancy only. KEY SKILLS SRE, Site Reliability Engineering, AWS, Kubernetes, Terraform, Grafana, Prometheus, OpenTelemetry, Go, .NET, Cloud Infrastructure, Observability, CI/CD, DevOps, Automation, Performance Tuning, IncidentManagementMore ❯
Wigan, Greater Manchester, United Kingdom Hybrid/Remote Options
Searchability (UK) Ltd
develop automation to improve uptime, and refine observability to provide real-time insight into platform health. You'll also play a key role in performance testing, system tuning and incidentmanagement to ensure smooth operation during critical events. SITE RELIABILITY ENGINEER ESSENTIAL SKILLS At least 2 years' experience working as an SRE Deep understanding of system reliability, scalability … automation and debugging Hands-on experience with AWS or another major cloud platform Knowledge of Kubernetes, Terraform, and Infrastructure as Code Strong networking fundamentals and distributed systems knowledge Proven incident response and root cause analysis experience Excellent collaboration and communication skills TO BE CONSIDERED: Please either apply through this advert or emailing me directly via (url removed). For … conjunction with this vacancy only. KEY SKILLS SRE, Site Reliability Engineering, AWS, Kubernetes, Terraform, Grafana, Prometheus, OpenTelemetry, Go, .NET, Cloud Infrastructure, Observability, CI/CD, DevOps, Automation, Performance Tuning, IncidentManagementMore ❯
Professional Certificate Knowledge Essential A good knowledge of computer architecture, Windows operating systems and office applications Data Protection and software licensing principals Directory Services/Administration Desirable ITIL Problem & IncidentManagement ITIL Configuration Management Desktop & application virtualisation Mobile Device Management Experience Essential Experience of supporting desktop technologies in a networked environment Experience of supporting the Windows … verbal communication skills Excellent analytical and problem solving skills Work proactively and without supervision Communicate clearly and confidently with colleagues at all levels of ability and seniority Good time management Disclosure and Barring Service Check This post is subject to the Rehabilitation of Offenders Act (Exceptions Order) 1975 and as such it will be necessary for a submission for More ❯
ensure optimal setup, configuration, maintenance, and security of both the development and production environments, ensuring that IT systems and infrastructures are reliable, scalable, and secure. Key Responsibilities Leadership Environment Management: Deployment & Automation: Performance & Scalability: Security & Compliance: Collaboration & Stakeholder Management: Documentation & Reporting: IncidentManagement & Problem Resolution: Capacity Planning: Escalate issues as appropriate. Manage assigned risks and issues. … communication skills, with the ability to collaborate across technical and non-technical teams. Preferred Qualifications : Experience with container orchestration platforms (e.g., Kubernetes). Familiarity with agile methodologies and project management tools (e.g., Jira, Confluence). More ❯
manchester, north west england, united kingdom Hybrid/Remote Options
On the Beach
Monitoring and Observability, Developer Portal and Kubernetes (EKS) platforms. Architecture and Design : Architect and design scalable, reliable, and secure Platform solutions, ensuring they meet current and future needs. Platform Management : Lead the management and optimization of our AWS estate and Kubernetes clusters, ensuring high availability, performance, and efficient resource utilisation. Automation and CI/CD : Develop and maintain … automation scripts and CI/CD pipelines to streamline deployment processes, ensuring continuous integration and delivery. Monitoring and IncidentManagement : Implement comprehensive monitoring solutions and lead incident response efforts to maintain Platform product reliability and performance. Cross-Team Collaboration : Work closely with Product Engineering, Digital Workspace and InfoSec teams to ensure seamless integration and operation of Platform … Terragrunt, CloudFormation, or similar. Deep understanding of CI/CD tools and pipelines, such as GitLab CI, Github Actions, Jenkins or similar. Solid understanding of networking, security, and identity management within AWS and Kubernetes environments. Current or previous experience with a modern programming/scripting languages such as C#, Go, Python, TypeScript, or Java. Advanced scripting skills in languages More ❯
your in-depth knowledge and experience in your chosen field to effectively influence and develop our services and teams. What you will do Respond to incidents logged in the IncidentManagement system and provide end users with a technical solution. Provide a point of technical escalation and expertise. Resolving technical issues escalated from 2nd line support Investigating and … Providing solutions to critical technical problems Collaborating with other teams to resolve technical issue Maintain technical accreditations in line with catalogued services. Recommend and deploy changes via the change management process when required. Proactively maintain and develop knowledge, skills and experience through client contact, industry sources, formalised training and development plan. Work with consultants to better understand issues and … complete scheduled tasks when required. Raise potential service issues initially with Team Leader/Service Desk Delivery Manager/Service Delivery Managers. Escalate potential problem issues with Problem and Incident Management. What we expect of you 3-5 years of experience in a similar SLA-driven support role. Proven experience delivering projects and complex changes. Proven capability in the More ❯
Manchester, England, United Kingdom Hybrid/Remote Options
Lorien
conditions. Working closely with Product, Platform, and Engineering teams, you'll ensure resilience principles are built into design, change, and support processes. You'll also take ownership of major incident and crisis management, third-party assurance, and continuous improvement initiatives. The Skill Requirements: We're looking for candidates with a blend of the following: Proven experience in operational … resilience, business continuity, or risk management within a regulated or complex environment Strong understanding of resilience frameworks including critical service mapping, dependency identification, and impact tolerance setting Knowledge of incidentmanagement, crisis response, disaster recovery, and third-party risk Experience applying resilience principles within agile, product-led, or DevOps environments Excellent communication skills with the ability to More ❯
Manchester, Lancashire, England, United Kingdom Hybrid/Remote Options
Lorien
all conditions.Working closely with Product, Platform, and Engineering teams, you'll ensure resilience principles are built into design, change, and support processes. You'll also take ownership of major incident and crisis management, third-party assurance, and continuous improvement initiatives. The Skill Requirements: We're looking for candidates with a blend of the following: Proven experience in operational … resilience, business continuity, or risk management within a regulated or complex environment Strong understanding of resilience frameworks including critical service mapping, dependency identification, and impact tolerance setting Knowledge of incidentmanagement, crisis response, disaster recovery, and third-party risk Experience applying resilience principles within agile, product-led, or DevOps environments Excellent communication skills with the ability to More ❯
to embed reliability into every layer of our technology stack. What You’ll Do Ensure the availability, scalability, and performance of systems through proactive monitoring and capacity planning. Lead incident response, root cause analysis, and implement preventive measures to avoid recurrence. Develop automation tools and scripts to reduce manual operations and improve system resilience. Optimize system performance and resource … performance tuning, high availability, and architecture. Strong scripting skills (e.g., PowerShell) and experience with automation/configuration tools like Ansible or Chef. Familiarity with observability tools, monitoring frameworks, and incidentmanagement practices. A mindset focused on eliminating TOIL, improving developer experience, and scaling operations through code. Excellent communication and collaboration skills. Bonus Points Experience with cloud platforms (Azure More ❯
Manchester, Lancashire, England, United Kingdom Hybrid/Remote Options
QA
well as on-site. Interact with internal teams and 3rd party vendors as appropriate as part of the supply/delivery/support chain Handle escalated Service Desk tickets, incidentmanagement and service requests as appropriate Contribute to and resolve escalated customer, supplier, and vendor issues Develop and demonstrate an understanding of customer and business needs Participate and … assist in driving the knowledge management process Participate in IT related projects Assist with the creation, distribution, and analysis of operational, business and financial reporting Contribute to the production of IT support documentation as part of knowledge base Desirable skills: Good communication skills Excellent customer service skills Basic level of IT knowledge and general IT skills Excellent time/… task management skills Good problem-solving skills Can manage a varied and unpredictable workload Pro-active approach to tasks Can work as part of a team but also autonomously with minimal supervision when required Able to make decisions (with support when required) Ability to deal with difficult customers and provide a satisfactory outcome in an efficient and polite manner More ❯
Liverpool, Merseyside, England, United Kingdom Hybrid/Remote Options
QA
well as on-site. Interact with internal teams and 3rd party vendors as appropriate as part of the supply/delivery/support chain Handle escalated Service Desk tickets, incidentmanagement and service requests as appropriate Contribute to and resolve escalated customer, supplier, and vendor issues Develop and demonstrate an understanding of customer and business needs Participate and … assist in driving the knowledge management process Participate in IT related projects Assist with the creation, distribution, and analysis of operational, business and financial reporting Contribute to the production of IT support documentation as part of knowledge base Desirable skills: Good communication skills Excellent customer service skills Basic level of IT knowledge and general IT skills Excellent time/… task management skills Good problem-solving skills Can manage a varied and unpredictable workload Pro-active approach to tasks Can work as part of a team but also autonomously with minimal supervision when required Able to make decisions (with support when required) Ability to deal with difficult customers and provide a satisfactory outcome in an efficient and polite manner More ❯
and enterprise software platforms (Microsoft, VMware). Transferable Skills/Experience Proven experience designing and deploying virtualised datacentres from assured baseline solutions with supporting documentation. Experience in Service and IncidentManagement (BMC Remedy). Network background (Switches, Routers, Firewalls) is highly desired. Windows Server & Administration Tools (Active Directory, Group Policy, DNS, Certs, PKI), Windows 10 Experience in Microsoft More ❯
Industry : Financial Services Salary : Up to £72,000 + discretionary bonus Key Responsibilities: Lead and develop a high-performing SRE team, driving collaboration and continuous improvement. Oversee system reliability, incidentmanagement, and root cause analysis. Define and implement automation, monitoring, and alerting strategies. Partner globally to align on uptime and resiliency goals. Promote SRE best practices and operational More ❯
Liverpool, England, United Kingdom Hybrid/Remote Options
Ventula Consulting
Requirements Proven experience managing enterprise-level CCTV, EACS, and OT systems in a multi-site environment. Strong technical knowledge of system architecture, networking, and security best practice. Skilled in incidentmanagement, data reporting (API/SQL), and process optimisation. Experienced in system audits, configuration, and integration with AD/JML processes. Excellent communication and stakeholder management skills. More ❯
Manchester, Lancashire, United Kingdom Hybrid/Remote Options
Maxwell Bond
and drive continuous improvement for a new build in South Manchester. The Role As the Data Centre Site Manager, you will oversee all aspects of site operations - from infrastructure management and vendor coordination to safety, compliance, and team leadership. You'll be responsible for maintaining 100% operational availability while fostering a culture of reliability, accountability, and technical excellence. What … for you £450-500 per day Hybrid working Working with a cutting-edge AI front thinking company Longevity - 12 months About You Proven experience in data centre operations, facilities management, or critical infrastructure environments. Strong knowledge of mechanical and electrical systems (HVAC, UPS, generators, switchgear, BMS, etc.). Excellent problem-solving and incidentmanagement capabilities. Working knowledge More ❯
Greater Manchester, England, United Kingdom Hybrid/Remote Options
Maxwell Bond
and drive continuous improvement for a new build in South Manchester. The Role As the Data Centre Site Manager , you will oversee all aspects of site operations — from infrastructure management and vendor coordination to safety, compliance, and team leadership. You’ll be responsible for maintaining 100% operational availability while fostering a culture of reliability, accountability, and technical excellence. What … for you £450-500 per day Hybrid working Working with a cutting-edge AI front thinking company Longevity – 12 months About You Proven experience in data centre operations, facilities management, or critical infrastructure environments . Strong knowledge of mechanical and electrical systems (HVAC, UPS, generators, switchgear, BMS, etc.) . Excellent problem-solving and incidentmanagement capabilities. Working More ❯
efficiently using data-driven approaches. Collaborate closely with engineers, architects, and product teams to deliver scalable, high-performing solutions aligned with business goals. Apply best practices in system performance, incidentmanagement, and security compliance. Stay current with emerging technologies and contribute to a culture of innovation and continuous improvement. Core Skills & Experience Proven experience as a DB2 Systems … and infrastructure optimisation. Desirable Skills Experience with hybrid or cloud-integrated mainframe systems. Exposure to modern DevOps practices and CI/CD automation within mainframe environments. Familiarity with risk management, governance, and compliance frameworks within large enterprises. More ❯
Manchester, England, United Kingdom Hybrid/Remote Options
Digital Waffle
Defender, Sentinel, Entra ID, Purview). Support and manage security vendor changes and onboarding of new solutions. Work cross-functionally to ensure secure design, delivery, and operations. Contribute to incidentmanagement and continuous security improvements. Maintain security policies, standards, and documentation. Skills & Experience Experience as a Cyber Security Manager, SOC Lead, or similar role. Strong hands-on knowledge More ❯
the resilience and efficiency of their core infrastructure. Key Responsibilities Act as SME for infrastructure projects and upgrades. Lead technical resolution of major incidents and disaster recovery. Oversee lifecycle management of infrastructure components. Drive Agile practices and team development. Implement automation and service improvements. Ensure security and availability of systems. Build strong relationships with suppliers and vendors. About You … Proven leadership in infrastructure and team management. Strong technical expertise in networking, cloud, and on-prem systems. Experience with Agile methodologies and automation. Skilled in incidentmanagement, system design, and supplier engagement. Committed to continuous improvement and innovation. Infrastructure Squad Lead - Fixed Term Contract (6 months More ❯
regulations including UNECE R.155 and China GB 44495, helping our client deliver secure and compliant vehicles to markets worldwide. You’ll report to the Functional Manager - Product Security Test & IncidentManagement, and work across test benches and vehicles to execute cybersecurity testing, support homologation, and contribute to the development lifecycle of secure automotive systems. What's on Offer More ❯