london (city of london), south east england, united kingdom
Harnham
performance cloud infra for ML workloads Build and manage GPU clusters, storage systems, and distributed training environments Set up and optimise containerised workflows (Docker, Kubernetes, Terraform) Implement robust monitoring, incidentresponse, and CI/CD practices Collaborate closely with researchers to integrate and scale experiments This person must have experience building ML Infrastructure and cloud architecture from scratch More ❯
BMS, CAFM, etc. Act as Senior Authorised Person (SAP) for High and Low Voltage systems. Manage the Permit to Work (PTW) system and review RAMS for all activities. Lead incidentresponse, root cause analysis, and corrective action processes. Deliver robust Planned Preventative Maintenance (PPM) and reactive maintenance schedules. Monitor site KPIs, service levels, and operational risks, ensuring swift More ❯
london (city of london), south east england, united kingdom
PRS
BMS, CAFM, etc. Act as Senior Authorised Person (SAP) for High and Low Voltage systems. Manage the Permit to Work (PTW) system and review RAMS for all activities. Lead incidentresponse, root cause analysis, and corrective action processes. Deliver robust Planned Preventative Maintenance (PPM) and reactive maintenance schedules. Monitor site KPIs, service levels, and operational risks, ensuring swift More ❯
comprehensive ICT risk management frameworks Develop and maintain detailed project plans, including gap analysis, remediation activities, and testing schedules Ensure production of all required documentation including policies, procedures, and incidentresponse protocols Manage third-party ICT service provider assessments and contractual reviews in line with DORA requirements Facilitate workshops and requirements gathering sessions with technical and business stakeholders … Monitor regulatory developments and adjust project scope accordingly Report progress to senior management and provide risk assessments on compliance readiness Coordinate ICT-related incident reporting processes and business continuity testing activities Track dependencies, manage issues, and ensure alignment with broader regulatory compliance programs Essential Experience & Skills DORA & Regulatory Expertise: MUST have proven experience in designing and implementing Operational Resilience … compliance assurance Deep understanding of the regulatory environment relating to operational resilience and business continuity Technical & Risk Management: Strong grasp of ICT Risk Management principles and practices Experience with incident reporting frameworks and processes Expertise in third-party supplier management and oversight, particularly for critical ICT service providers Track record of managing operational disruption incidents and crisis situations Stakeholder More ❯
london (city of london), south east england, united kingdom
McCabe & Barton
comprehensive ICT risk management frameworks Develop and maintain detailed project plans, including gap analysis, remediation activities, and testing schedules Ensure production of all required documentation including policies, procedures, and incidentresponse protocols Manage third-party ICT service provider assessments and contractual reviews in line with DORA requirements Facilitate workshops and requirements gathering sessions with technical and business stakeholders … Monitor regulatory developments and adjust project scope accordingly Report progress to senior management and provide risk assessments on compliance readiness Coordinate ICT-related incident reporting processes and business continuity testing activities Track dependencies, manage issues, and ensure alignment with broader regulatory compliance programs Essential Experience & Skills DORA & Regulatory Expertise: MUST have proven experience in designing and implementing Operational Resilience … compliance assurance Deep understanding of the regulatory environment relating to operational resilience and business continuity Technical & Risk Management: Strong grasp of ICT Risk Management principles and practices Experience with incident reporting frameworks and processes Expertise in third-party supplier management and oversight, particularly for critical ICT service providers Track record of managing operational disruption incidents and crisis situations Stakeholder More ❯
or post-sales roles in managed services, or cybersecurity Ability to engage with both executive stakeholders and technical teams Knowledge of cybersecurity concepts such as SIEM, EDR, MDR, and incidentresponse Strong communicator with the ability to translate technical detail into business value Track record with renewals and structured success plans This is your chance to shape our More ❯
london (city of london), south east england, united kingdom
Insignis Talent
or post-sales roles in managed services, or cybersecurity Ability to engage with both executive stakeholders and technical teams Knowledge of cybersecurity concepts such as SIEM, EDR, MDR, and incidentresponse Strong communicator with the ability to translate technical detail into business value Track record with renewals and structured success plans This is your chance to shape our More ❯
City of London, London, United Kingdom Hybrid / WFH Options
Oho Group Ltd
infrastructure A thoughtful, pragmatic engineering approach Curiosity about security and detection (no prior experience required) Bonus if you’ve worked with: Event-driven or distributed systems Security tooling or incidentresponse workflows Why Join? Work on hard, meaningful problems in cybersecurity Be part of a fast, technical, remote-first team Competitive salary and meaningful equity Founding Engineer - London More ❯
london (city of london), south east england, united kingdom Hybrid / WFH Options
Oho Group Ltd
infrastructure A thoughtful, pragmatic engineering approach Curiosity about security and detection (no prior experience required) Bonus if you’ve worked with: Event-driven or distributed systems Security tooling or incidentresponse workflows Why Join? Work on hard, meaningful problems in cybersecurity Be part of a fast, technical, remote-first team Competitive salary and meaningful equity Founding Engineer - London More ❯
low latency trading and research platform. Core responsibilities: Engineering work across Routing, Switching, Security, Proxies and many other areas - Lots of greenfield project work Designing scalable Network solutions Network incidentresponse (l1/l2 escalation) hands on troubleshooting Adopting automation and figuring areas of improvement Working to tight timelines in a fast paced and dynamic environment Core skills More ❯
london (city of london), south east england, united kingdom
Hunter Bond
low latency trading and research platform. Core responsibilities: Engineering work across Routing, Switching, Security, Proxies and many other areas - Lots of greenfield project work Designing scalable Network solutions Network incidentresponse (l1/l2 escalation) hands on troubleshooting Adopting automation and figuring areas of improvement Working to tight timelines in a fast paced and dynamic environment Core skills More ❯
Infrastructure as Code) Work with virtualisation (VMware/vSphere, etc.) Configure/manage SAN/storage, Fibre Channel, zoning, LUN provisioning Participate in vulnerability assessments, patches, security hardening, and incidentresponse Required Skills & Experience NPPV3 clearance, either current or active within the last 12 months (non-negotiable) Strong track record with Windows?11 deployment (imaging, upgrade, Autopilot, Intune More ❯
software development and systems engineering. A high bar for code and configuration quality and readability. A good understanding of current observability and reliability practices. Experienced and comfortable in running incident response. Big picture thinking - you can make trade offs on technical work streams against business impact. Fantastic communication skills. You're able to articulate what you're working on More ❯
london (city of london), south east england, united kingdom
Duffel
software development and systems engineering. A high bar for code and configuration quality and readability. A good understanding of current observability and reliability practices. Experienced and comfortable in running incident response. Big picture thinking - you can make trade offs on technical work streams against business impact. Fantastic communication skills. You're able to articulate what you're working on More ❯
is reliable, scalable, and secure. Ensure the reliability, availability, and scalability of the systems, platforms, and technology through the application of software engineering techniques, automation, and best practices in incident response. Accountabilities Build Engineering: Development, delivery, and maintenance of high-quality infrastructure solutions to fulfil business requirements ensuring measurable reliability, performance, availability, and ease of use. Including the identification … of the appropriate technologies and solutions to meet business, optimisation, and resourcing requirements. Incident Management: Monitoring of IT infrastructure and system performance to measure, identify, address, and resolve any potential issues, vulnerabilities, or outages. Use of data to drive down mean time to resolution. Automation: Development and implementation of automated tasks and processes to improve efficiency and reduce manual More ❯
is reliable, scalable, and secure. Ensure the reliability, availability, and scalability of the systems, platforms, and technology through the application of software engineering techniques, automation, and best practices in incident response. Accountabilities Build Engineering: Development, delivery, and maintenance of high-quality infrastructure solutions to fulfil business requirements ensuring measurable reliability, performance, availability, and ease of use. Including the identification … of the appropriate technologies and solutions to meet business, optimisation, and resourcing requirements. Incident Management: Monitoring of IT infrastructure and system performance to measure, identify, address, and resolve any potential issues, vulnerabilities, or outages. Use of data to drive down mean time to resolution. Automation: Development and implementation of automated tasks and processes to improve efficiency and reduce manual More ❯