enhance the observability, reliability, and performance of the production network. Enhance existing monitoring and observability frameworks, integrating intelligent alerting and self-remediation capabilities to reduce manual intervention and improve incident response. Define and measure service-level objectives (SLOs) to track infrastructure performance and reliability. Write software utilizing orchestration systems to automate tasks and interact with other systems. Provide mentorship More ❯
enhance the observability, reliability, and performance of the production network. Enhance existing monitoring and observability frameworks, integrating intelligent alerting and self-remediation capabilities to reduce manual intervention and improve incident response. Define and measure service-level objectives (SLOs) to track infrastructure performance and reliability. Write software utilizing orchestration systems to automate tasks and interact with other systems. Provide mentorship More ❯
the Vorboss network. Day-to-day, you'll be balancing hands-on tasks with higher-level planning and automation of several functions within the systems infrastructure, as well as incident response. Key responsibilities: Maintenance, configuration, and reliable operation of hypervisor platforms, VMs, containers, bare-metal servers and applications hosted therein. Develop and maintain internal monitoring, automation scripts, and dashboards More ❯
software development and systems engineering. - A high bar for code and configuration quality and readability. - A good understanding of current observability and reliability practices. - Experienced and comfortable in running incident response. - Big picture thinking - you can make trade offs on technical work streams against business impact. - Fantastic communication skills. You're able to articulate what you're working on More ❯
job-specific technical skills This role can be based in our Knutsford, or Glasgow, locations. Purpose of the role To apply software engineering techniques, automation, and best practices in incidentresponse, to ensure the reliability, availability, and scalability of the systems, platforms, and technology through them. Accountabilities Availability, performance, and scalability of systems and services through proactive monitoring … maintenance, and capacity planning. Resolution, analysis and response to system outages and disruptions, and implement measures to prevent similar incidents from recurring. Development of tools and scripts to automate operational processes, reducing manual workload, increasing efficiency, and improving system resilience. Monitoring and optimisation of system performance and resource usage, identify and address bottlenecks, and implement best practices for performance More ❯
job-specific technical skills This role can be based in our Knutsford, or Glasgow, locations. Purpose of the role To apply software engineering techniques, automation, and best practices in incidentresponse, to ensure the reliability, availability, and scalability of the systems, platforms, and technology through them. Accountabilities Availability, performance, and scalability of systems and services through proactive monitoring … maintenance, and capacity planning. Resolution, analysis and response to system outages and disruptions, and implement measures to prevent similar incidents from recurring. Development of tools and scripts to automate operational processes, reducing manual workload, increasing efficiency, and improving system resilience. Monitoring and optimisation of system performance and resource usage, identify and address bottlenecks, and implement best practices for performance More ❯
monitoring system performance or availability, and performing security upgrades Must have strong communication skills and a solid understanding of IT Security concepts to include vulnerability & patch management, security operations, Incident Management and Incident response. Experience with integrating Cybersecurity data using enterprise or custom tools data aggregation and analysis tools, including Splunk Ability to provide support in an IT … operations and maintenance, including ticket work information updates, issue response, and remediation by understanding and analysing vulnerability scan results, system audits, log events and troubleshoot software issues. Strong knowledge and experience with log monitoring and correlations and correlating events from multiple security tools like log correlation engines, Net flow, host monitoring solutions Excellent troubleshooting/problem solving skills. Experience … of dealing with incident, problem and change management processes. Proven working experience of Windows and Linux operating systems. Solid understanding of networking technologies; switches, routers, firewalls, proxies, IDS, IPS. Role 2 As an experienced Nessus Engineer responsible for maintaining the tool and the remediation of vulnerabilities across the bank. Primary Responsibilities: Installing and Configuring Nessus, Nessus Manager, Nessus Agents More ❯
Define and implement observability standards, including logging, metrics, tracing, and alerting . Use tools like New Relic , Prometheus , and Grafana , alongside building custom instrumentation for key platform services. Drive incident readiness and operational resilience by enabling actionable monitoring and alerting. Drive cloud cost visibility and optimization efforts across engineering through dashboards, tagging standards, and automation. Partner with stakeholders to … platforms and enablement frameworks. Experience with cloud-native technologies, Kubernetes, and Infrastructure as Code (Terraform, Helm, etc.). Strong understanding of observability tooling (especially New Relic, Prometheus, Grafana) and incidentresponse best practices. Familiarity with FinOps, platform cost tracking, and infrastructure efficiency techniques. Excellent communication, leadership, and stakeholder management skills. Attract, hire, and develop talented platform engineers with More ❯
authoring switching schedules Ensuring compliance with all site RAMS and H&S regulations Liaising with client representatives and reporting to senior management Taking ownership of critical plant performance and incidentresponse Qualifications & Experience Required: Valid and in-date HV Authorised Person certification Minimum 35 years experience in a supervisory or leadership role Data centre experience is essential Background More ❯
. Expertise in SAP NetWeaver, Hana, and Unix/Linux environments . Strong knowledge of SAP ECC, BW, APO, PI, IBP, C4C, Cloud Connector, and Fiori . Familiarity with incidentresponse and problem management . SAP Security experience (role management & access) a plus! Apply now to speak with VIQU IT in confidence. Or reach out to Connor Smal More ❯
. Expertise in SAP NetWeaver, Hana, and Unix/Linux environments . Strong knowledge of SAP ECC, BW, APO, PI, IBP, C4C, Cloud Connector, and Fiori . Familiarity with incidentresponse and problem management . SAP Security experience (role management & access) – a plus! Apply now to speak with VIQU IT in confidence. Or reach out to Connor Smal More ❯
provided if not yet authorised) Escalating faults and carrying out fault diagnostics across essential systems Working alongside other engineers to ensure uptime and performance Supporting emergency call-outs and incidentresponse where required Completing maintenance logs and compliance documentation Qualifications & Experience: Time-served Engineer with Level 3 qualification in Electrical or Mechanical Engineering 18th Edition (for Electrical bias More ❯
proactive performance tuning. Participate in network automation efforts using Python, Ansible, or equivalent tools. Document network topologies, device configurations, and change procedures. Provide L2/L3 on-call support, incidentresponse, and root cause analysis. Skills & Experience Required 5+ years in networking, preferably in a hyperscaler or HPC environment Proficient with L2/L3 protocols, automation tools, and More ❯
deeply technical - you will be expected to move the needle through technical collaboration and direct contributions when required. Raise the Bar - Champion best practices: developer velocity, code quality, observability, incidentresponse, user analytics, product engagement - so "excellent" becomes the default. Scale the Team - Attract, hire and retain top talent; run a fair, data driven performance process; build clear More ❯
to raise the bar. Identify problems, investigate root causes, and implement pragmatic solutions with autonomy. Continuously improve how we work as a team: from CI/CD pipelines to incidentresponse and everything in between. Stay curious about how AI can enhance our work, adopting tools and workflows that unlock better productivity and insight. About you You're More ❯
London, South East, England, United Kingdom Hybrid / WFH Options
LANCESOFT LTD
and cause crash sources. System generalists profiles. These engineers would be part of an existing team; duties include providing a 1st line diagnostic and resolution of problems with clear incident response. What does a strong candidate look like? Meets Common Job Requirements below and, in addition • Experience with C# • Experience with Unity C# • Experience with tools such as Phabricator More ❯
City of London, London, England, United Kingdom Hybrid / WFH Options
Atrium Workforce Solutions Ltd
stack. System generalists and product generalists profiles. These engineers would be part of an existing team; duties include providing a 1st line diagnostic and resolution of problems with clear incident response. Role Overview: Job Title: C#/Unity Application Development/Debugging Engineer Location: London/Hybrid 3 days onsite per week Contract Type: Contract Duration: 6 months initially More ❯
Digital Transformation, we are investing in IT services to enhance learning and research capabilities, fostering global collaboration on pressing issues. Your role involves supporting existing services and applications, managing incident responses, and delivering sustainable services that facilitate discovery, usability, management, and preservation of Special Collections metadata and digital collections. You will also support the Library's Digital Library repository More ❯
West Malling, Kent, United Kingdom Hybrid / WFH Options
Lumina Energy
evolving, and our team is at the heart of protecting critical infrastructure and data. As a Cyber Security Engineer, you'll help lead our proactive efforts in threat detection, response, and mitigation. This role is vital to safeguarding the confidentiality, integrity, and availability of systems and services. What you'll be doing Act on security alerts, incidents, and events … Darktrace to monitor and prevent threats. Analyse malware and respond to high-priority incidents. Support vulnerability management and threat analysis activities. Participate in our on-call rotation for cyber incident response. Contribute to documentation and certification processes such as Cyber Essentials. Engage with cross-functional teams to improve security practices and awareness. What we're looking for Must have More ❯
Information Technology, Enterprise Resource Planning (ERP), and Engineering consulting, with the aim of becoming an internationally renowned Systems Integration Company. Job Description We are currently seeking an IT Major Incident/Problem Manager for a contract position based in Crawley, England. The role involves managing major incidents and problems, ensuring root causes are identified, and implementing process improvements. The … successful candidate will report to the IT Operations Manager and be responsible for coordinating incident responses, conducting RCA reports, and analyzing incident trends to prevent recurrence. Responsibilities Manage major incident and problem management processes across services, suppliers, and customers. Coordinate rapid response to incidents, minimizing system downtime. Provide technical skills and gap analysis to improve incident and problem management. Analyze incident data to propose resolutions and prevent future incidents. Requirements Excellent communication and organizational skills. Proven experience in Incident and Problem Management. Self-motivated with a focus on customer service. CRB Security Check clearance. Qualifications and Experience Knowledge of IT infrastructure components such as hardware, databases, and networks. Understanding of IT concepts and More ❯
Manchester, Lancashire, England, United Kingdom Hybrid / WFH Options
Tenth Revolution Group
and availability of our security infrastructure. What You'll Be Doing * Managing Hardware Security Modules (HSMs)and cryptographic infrastructure* Creating, storing, and retiring encryption keyssecurely across multiple platforms* Supporting incident and change managementprocesses* Collaborating with application, infrastructure, and support teams* Ensuring compliance with security standards and audit requirements* Contributing to project deliveryand continuous improvement initiatives What We're Looking … work under pressure* Excellent communication and stakeholder management skills Nice to Have * ITIL Foundation certification* Security or project management certifications* Experience with tools like JIRA, Confluence, SharePoint* Background in incident responseand risk management Benefits * Salary up to £41,000 depending on experience* Pension of 12%* Private medical* Discretionary bonus Please Note: This is a permanent role for UK residents More ❯
Macclesfield, Cheshire, England, United Kingdom Hybrid / WFH Options
Tenth Revolution Group
and availability of our security infrastructure. What You'll Be Doing * Managing Hardware Security Modules (HSMs)and cryptographic infrastructure* Creating, storing, and retiring encryption keyssecurely across multiple platforms* Supporting incident and change managementprocesses* Collaborating with application, infrastructure, and support teams* Ensuring compliance with security standards and audit requirements* Contributing to project deliveryand continuous improvement initiatives What We're Looking … work under pressure* Excellent communication and stakeholder management skills Nice to Have * ITIL Foundation certification* Security or project management certifications* Experience with tools like JIRA, Confluence, SharePoint* Background in incident responseand risk management Benefits * Salary up to £41,000 depending on experience* Pension of 12%* Private medical* Discretionary bonus Please Note: This is a permanent role for UK residents More ❯
sponsored events in the UK and the wider region. Deliver the Region's Security Awareness and Education initiatives, including operational security, risk management, resilience, and brand protection. Manage security incident responses, investigations, and provide advice to personnel at all levels. Conduct Site Security Assurance audits and manage the Global Security Services contract within the area of responsibility, ensuring high … briefings for management teams and employees, along with other security education materials. Establish and maintain liaison with international and national intelligence and law enforcement agencies to support global security response activities, including providing a 24/7 security response service as necessary. Identify threats to the business units, assess risks, and provide proportionate advice to effectively manage these … risks. Candidate Requirements This role requires a strong understanding of corporate security procedures, risk management methodologies, investigations, incident management, crisis and continuity management, operational security threat management, security intelligence, and security technology. Excellent written and oral communication skills, along with strong presentation skills, are essential to support all levels of client leadership with reliable, meaningful information for fact-based More ❯