technical solutions and resource plans. Serve as the technical voice in executive discussions and strategic planning. Ensure all systems and software meet internal standards and external compliance requirements. Oversee incidentresponse, vulnerability management, and disaster recovery plans. As a visionary and strategic technology leader, the Director of Software Engineering is responsible for shaping and executing the software development More ❯
in line with our Mid-Market technology roadmap. The Role Technology & Systems Management Oversee the ongoing maintenance and development of Mid-Market business applications and platforms. Lead fault resolution, incidentresponse, and ensure timely ticket management. Ensure compliance with security policies and lead on risk remediation activities. Manage cloud development, architecture, and system integrations. Coordinate licensing, certificates, and More ❯
in line with our Mid-Market technology roadmap. The Role Technology & Systems Management Oversee the ongoing maintenance and development of Mid-Market business applications and platforms. Lead fault resolution, incidentresponse, and ensure timely ticket management. Ensure compliance with security policies and lead on risk remediation activities. Manage cloud development, architecture, and system integrations. Coordinate licensing, certificates, and More ❯
Reigate, Surrey, England, United Kingdom Hybrid / WFH Options
Client Server Ltd
expertise with the team. Other responsibilities will encompass, proactive monitoring of production environments, design and implementation of automation and processes to improve efficiency and effectiveness, taking a lead in incidentresponse, troubleshooting and root cause analysis activities to mitigate future issues. You'll collaborate with senior business stakeholders to gather requirements, address concerns and provide updates on projects More ❯
Reigate, Surrey, South East, United Kingdom Hybrid / WFH Options
Client Server
expertise with the team. Other responsibilities will encompass, proactive monitoring of production environments, design and implementation of automation and processes to improve efficiency and effectiveness, taking a lead in incidentresponse, troubleshooting and root cause analysis activities to mitigate future issues. You'll collaborate with senior business stakeholders to gather requirements, address concerns and provide updates on projects More ❯
data accessibility. Implement data pipeline monitoring, alerting, and logging to detect failures and performance bottlenecks. Build automation to ensure data quality, lineage tracking, and schema evolution management. Participate in incidentresponse, troubleshooting, and root cause analysis for data issues. Advocate for DataOps best practices, driving automation, reproducibility, and scalability. Document infrastructure, data workflows, and operational procedures. What are More ❯
Caldecotte, Milton Keynes, Buckinghamshire, England, United Kingdom
Connells Group HQ
Basic knowledge of monitoring, logging, and observability tools Understanding of cloud cost management and resource optimisation principles Comfort with troubleshooting and supporting development teams Understanding of service reliability and incidentresponse practices Connells Group UK is an equal opportunities employer and positively encourages applications from suitably qualified and eligible candidates regardless of sex, race, disability, age, sexual orientation More ❯
Birmingham, West Midlands, England, United Kingdom Hybrid / WFH Options
Bullion By Post
e-commerce platform Build and maintain deployment pipelines and infrastructure as code using Ansible Monitor performance and system health using Prometheus and Grafana Strengthen security, backups, and compliance Lead incidentresponse, root cause analysis, and post-mortems Collaborate with development teams on CI/CD workflows and scalable architecture Document internal systems and assist with onboarding and training More ❯
microservices Collaborate with teams across our Platform to recommend and implement the right DevOps tooling and automation strategies aligned with our enterprise architecture Continuously improve deployment processes, monitoring, and incidentresponse to ensure high availability and performance Provide mentorship and technical guidance to junior engineers, fostering a culture of continuous learning and innovation Identify and drive improvements in More ❯
readiness and scalability through monitoring and forecasting. System Observability - Proactively detect issues, build alerting systems, and centralize health dashboards. Production Risk Management - Ensure safe software releases, drive infrastructure improvements. IncidentResponse - Lead or support fast, effective remediation during live incidents; build automation for common operational issues. What We're Looking For We're seeking a leader who can More ❯
readiness and scalability through monitoring and forecasting. System Observability - Proactively detect issues, build alerting systems, and centralize health dashboards. Production Risk Management - Ensure safe software releases, drive infrastructure improvements. IncidentResponse - Lead or support fast, effective remediation during live incidents; build automation for common operational issues. What We're Looking For We're seeking a leader who can More ❯
engineering, and security teams to implement DevOps best practices, define and enforce service-level objectives, and build a scalable monitoring and alerting platform. Key Responsibilities Automate deployment, monitoring, and incidentresponse processes using GCP-native tools and technologies. Develop capabilities which allow Platform Engineering teams in Onyx to operate with a DevOps ethos. Collaborate with development teams to More ❯
proactive refactoring and system improvements Drive and approve high-impact technical decisions with long-term maintainability and scalability in mind Monitor system performance and ensure strong observability, alerting, and incidentresponse practices Contribute to architecture documentation and facilitate system knowledge sharing Partner with engineering and product leadership to influence long-term engineering strategy and technical roadmap About You More ❯
to improve financial performance from external partnerships. Key Accountabilities Own provider relationships : Act as the operational lead for key suppliers. Performance & SLA management : Define and track SLAs, KPIs, and incidentresponse processes to ensure consistently high performance. Issue resolution : Coordinate with internal teams (Product, Tech, Finance, Legal) to quickly resolve provider-related disruptions (e.g. payment failures, ID verification More ❯
Real-time and batch model serving Online/offline feature consistency via our feature store Monitoring and alerting Hold high standards for operational excellence, including testing, monitoring, maintainability, and incidentresponse Contribute to a strong ML engineering culture focused on scalability, collaboration, and continuous learning Required Skills and Experience Proven track record of building and deploying ML pipelines More ❯
s data strategy , enabling the intelligent use of mobility, behavioural, and payment data to unlock new product and commercial opportunities. Ensure platform reliability, performance, and scalability through robust observability, incidentresponse processes, performance testing, and fault-tolerant architecture. Partner with Security, Compliance, and Infrastructure teams to meet regulatory and certification standards (e.g., PCI DSS, TISAX, ISO 27001), and More ❯
Nutfield, Redhill, Surrey, England, United Kingdom
Lynx Recruitment Ltd
line technical support via phone, email, and ticketing Lead and support project work and customer onboarding Maintain and secure cloud infrastructure (patching, updates, backups) Mentor junior staff and manage incident responses Help evolve documentation and internal processes Requirements 3+ years in an MSP or similar IT environment Driving Liscence Strong grasp of Windows Server, Active Directory, VMware/Hyper More ❯
call responsibilities to address critical incidents and maintain system availability. Essential functions Provide support and ensure the stability of Data Platform solutions. Participate in an on-call rotation for incident response. Manage cloud resources using IaC tools like CloudFormation and Terraform on AWS and GCP. Implement data security best practices in cloud applications. Apply cloud networking knowledge (VPCs, Route More ❯
process. Check the compliance of the configuration and implementation against defined technical security standards and product baselines. Problem resolution and support. Work together with other technical teams on 'operational incident responses'. As the process owner, initiate any configuration review/recertification process and work with the other stakeholders (business and technical) to periodically review product configurations and implementation More ❯
Security Ensure compliance with regulations (GDPR, ISO 27001, etc.). Implement security protocols around data access, retention, and classification. Work with InfoSec and Legal to manage data risk and incident response. What Success Looks Like A scalable, secure, and modern data platform is live. High-quality data reporting empowers business-wide decision-making. A respected leader is in place More ❯
engineers , shaping runtime environments, data pipelines, and workflow orchestration for large-scale model serving. Continuously improve reliability and cost efficiency through chaos testing, capacity planning, performance tuning, and proactive incident response. Requirements You May Be a Good Fit if You Have : Core technical expertise in the following domains: Cloud Infrastructure Mastery (AWS, Azure): deep understanding of IAM, networking, storage More ❯
cancer as you are. What will be some of the main responsibilities? Strategic Leadership: Develop and execute CRUK's information security strategy, aligning with organisational goals and risk appetite. Incident Management: Oversee security incidents and investigations, ensuring effective response and remediation. Compliance and Governance: Ensure compliance with UK GDPR, Data Protection Act 2018, PCIDSS v4.0, and other relevant … Data Privacy, Risk, and Audit teams. Security Operations: Implement and enhance security controls across various platforms (Microsoft 365/Azure, AWS, Salesforce, etc.). Manage threat intelligence, monitoring, and incident response. Policy Development: Develop and maintain information security policies, procedures, and guidance. Stakeholder Engagement: Communicate effectively with C-suite, trustees, regulators, and technical teams. Represent CRUK in external security More ❯
infrastructure deployments. Optimize and manage security configurations including IAM policies, network access controls, and encryption protocols . Proactively monitor and respond to security incidents using AWS-native detection and response services. Conduct periodic security assessments, audits, and reviews to align with best practices and regulatory standards. Collaborate with internal stakeholders, including engineering and operations teams, to integrate security into … cloud-native development workflows. Produce detailed documentation and reports related to threat detection, incidentresponse, and mitigation efforts. Provide mentorship, technical leadership, and establish best practices for cloud security implementation. Work closely with external partners or clients to understand their security needs and design tailored cloud protection strategies. More ❯
and video services. Oversee live event execution, SLA compliance, service bookings, and customer support. Act as the senior point of escalation for complex incidents (Tier 3 support). Drive incidentresponse, root cause analysis, and proactive monitoring/reporting. Develop and implement TOC strategy, staffing models, and documentation standards. Participate in systems architecture, new tech evaluation, and vendor … a TOC, NOC, or MCR environment. Strong understanding of live broadcast workflows, encoding, transmission, and routing. Deep knowledge of TCP/IP networking (switching, routing, multicast). Excellent leadership, incident management, and performance development skills. Strong documentation and process optimisation experience. High-pressure decision-making and problem-solving capabilities. Proficiency with Excel/Google Sheets; adaptable across Windows, MacOS More ❯