for cost savings and drive initiatives to optimise spend without compromising performance. Reliability & Performance - champion best practices for reliability, scalability, and observability. Champion the improvement of monitoring, alerting, and incidentresponse processes to minimise downtime and ensure service continuity. Security & Compliance - apply an enterprise security mindset, incorporating zero-trust security principles where applicable. Leverage existing security best practices More ❯
Leichester, Leicester, Leicestershire, United Kingdom
Vacancy Filler (Integration)
and secure systems. In this role, you will collaborate closely with development, QA, and IT teams to streamline CI/CD pipelines, automate infrastructure, and ensure efficient monitoring and incident response. Ideal candidates have strong experience with cloud platforms (e.g., AWS, Azure, or GCP), containerization (e.g., Docker, Kubernetes), and infrastructure-as-code tools (e.g., Terraform, Ansible). More ❯
Azure cloud platform. Knowledge of security frameworks and blueprints. Understanding of capacity management processes. Ability to document technical standards and processes. Nice to Have: Familiarity with security notifications and incident response. Experience with Azure batch services. Previous involvement in project management. Other Details: This is a support-focused role involved in the development of an Azure platform. Candidates should More ❯
process. Check the compliance of the configuration and implementation against defined technical security standards and product baselines. Problem resolution and support. Work together with other technical teams on 'operational incident responses'. As the process owner, initiate any configuration review/recertification process and work with the other stakeholders (business and technical) to periodically review product configurations and implementation More ❯
process. Check the compliance of the configuration and implementation against defined technical security standards and product baselines. Problem resolution and support. Work together with other technical teams on 'operational incident responses'. As the process owner, initiate any configuration review/recertification process and work with the other stakeholders (business and technical) to periodically review product configurations and implementation More ❯
Manchester, North West, United Kingdom Hybrid / WFH Options
Robert Walters
and runbooks What you bring: The ideal candidate for this IT Service Continuity Lead role will bring a strong background in IT continuity planning, disaster recovery, risk management, and incident response. Your expertise in designing and testing robust IT service continuity plans that align with business priorities will be essential. You should have a thorough understanding of BIA and More ❯
Birmingham, West Midlands, United Kingdom Hybrid / WFH Options
Robert Walters
and runbooks What you bring: The ideal candidate for this IT Service Continuity Lead role will bring a strong background in IT continuity planning, disaster recovery, risk management, and incident response. Your expertise in designing and testing robust IT service continuity plans that align with business priorities will be essential. You should have a thorough understanding of BIA and More ❯
Employment Type: Contract, Work From Home
Rate: Outside IR35 Competitive Day Rate, Home Based
track down the root cause. Communicate the impact of the problem to stakeholders in terms of business value, helping to set a priority for the resolution. Actively participate in incident responses. Engineering standards & frameworks - Maintain knowledge of Xero's current and emerging engineering standards and practices. Develop and deploy software that meets Xero's standards. Continuous improvement - Maintain knowledge More ❯
Guildford, Surrey, United Kingdom Hybrid / WFH Options
EURAXESS Czech Republic
skills and experience Full range of system administration skills including user management, building/deployment, installing scientific software packages, performance benchmarking, resource utilisation/performance and availability monitoring and incident response. automation of repetitive tasks in the form of developing and maintaining Ansible playbooks and roles using git for version control, change management and collaboration. Pro-actively and reactively More ❯
Change and Transformation• Professional Collaboration• Digital and Technology Communication• Business Acumen• Problem Solving Tools• Risk and Controls Distributed Systems.• To apply software engineering techniques automation and best practices in incidentresponseMore ❯
West Midlands, England, United Kingdom Hybrid / WFH Options
MYO Talent
Engineer Must have strong Dynatrace experience Strong reliability, performance, and availability of systems, leveraging Dynatrace for monitoring and troubleshooting Dynatrace delivery, support and implementation Installation and Configuration, Performance Analysis, IncidentResponse, Automation Experience with modern observability platforms, collecting infrastructure and application metrics Cloud/Azure More ❯
Solihull, West Midlands, United Kingdom Hybrid / WFH Options
MYO Talent
Engineer Must have strong Dynatrace experience Strong reliability, performance, and availability of systems, leveraging Dynatrace for monitoring and troubleshooting Dynatrace delivery, support and implementation Installation and Configuration, Performance Analysis, IncidentResponse, Automation Experience with modern observability platforms, collecting infrastructure and application metrics Cloud/Azure More ❯
scripting tools (e.g., PowerShell, Bash) Monitor infrastructure health and performance, ensuring high availability Collaborate with DevOps, security, and development teams on infrastructure projects Participate in on-call rotations and incidentresponse Requirements: 3+ years of experience in IT infrastructure Strong knowledge of Windows Server and Active Directory Solid experience with Linux systems administration Good understanding of TCP/ More ❯
Manager and wider tech/data teams Tech Stack You’ll Use: AWS: Glue, S3, Redshift, Airflow Power BI: Deployments, troubleshooting, performance tuning ETL & Scripting: SQL, Python (desirable) Monitoring & incidentresponse in a live production data environment What You'll Need ✅ Extensive experience in analytics engineering Strong hands-on experience with AWS data tools and Power BI Solid More ❯
driving automation and supporting the development teams with robust CI/CD infrastructure in a hands-on leadership role. KEY RESPONSIBILITIES - Oversee day-to-day cloud operations, including monitoring, incidentresponse and trouble shooting. - Leading and managing short and long term project planning. - Developing and implementing cloud governance, security and compliance. - Leading automation and IaC improvements. - Providing mentorship More ❯
driving automation and supporting the development teams with robust CI/CD infrastructure in a hands-on leadership role. KEY RESPONSIBILITIES - Oversee day-to-day cloud operations, including monitoring, incidentresponse and trouble shooting. - Leading and managing short and long term project planning. - Developing and implementing cloud governance, security and compliance. - Leading automation and IaC improvements. - Providing mentorship More ❯
for network infrastructure, ensuring a secure, scalable, and resilient environment. Manage and mentor a team of network engineers, supporting their development and daily operations. Oversee network performance, uptime, and incidentresponse in line with SLAs and business requirements. Lead or contribute to major network projects, including an ongoing SD-WAN transformation . Collaborate with internal teams and external More ❯
as the senior escalation point for London-based trading network issues Collaborate with global peers and vendors to manage implementation and upgrades Support and optimise monitoring , capacity planning , and incidentresponse Recruitment process : Intro call with our Talent Acquisition team Technical interview Final meeting with the engineering and project leads Apply now: As an employer, MARGO offers equal More ❯
encourage you to apply now! Responsibilities: Supporting site migrations (Tues-Thurs), mostly remotely Working on infrastructure and change projects in the background Handling L3 support escalations, proactive monitoring, and incidentresponse Supporting third parties rolling out Aruba EdgeConnect and Aruba Central Attending CAB calls, planning upgrades, and coordinating SD-WAN changes Troubleshooting ClearPass, switch templates, and Aruba Central More ❯
issues across environments Collaborate with cross-functional teams to ensure high availability and performance Implement best practices for container security, scalability, and observability Participate in on-call rotations and incidentresponse efforts Document architecture, processes, and troubleshooting guides Other Responsibilities Ensure uptime, scalability, and performance of common 3rd party and internally developed eDiscovery applications Participate in a variety More ❯
big data" technologies, the management of data, and data pipelines Familiarity with functional programming concepts Has run production workloads of 1000s QPS Has been part of an "on call" incidentresponse team (though this role does not involve an "on call" component) Job Benefits Hybrid working and the option to work from almost anywhere for up to More ❯
Manage and monitor the performance of internal tooling and fraud rules. Work cross-functionally across Operations, Engineering, Product, and Finance to mitigate areas of risk. Represent Trust & Safety during incidentresponse and create mitigation processes. About You: 3+ years of experience in fraud or chargeback operations, investigations, or a related Trust & Safety vertical. Experience with user restriction systems More ❯
best practices while mentoring team members and driving innovation through Network automation initiatives and Cross-regional collaboration. - Own and manage critical network and system configurations across AU wide. - Spearhead incidentresponse and business continuity efforts while managing critical infrastructure configurations and ensuring optimal performance metrics. - Collaborate on global technical initiatives like increasing remote resolution. BASIC QUALIFICATIONS - 3+ years More ❯
principles such as domain modelling and modular design Collaborate with team leads to guide delivery and implementation Provide hands-on support through code reviews, prototyping, and pair programming Support incidentresponse efforts and assist with resolving critical technical issues Rapidly gain deep knowledge of internal BI tools and systems Create and maintain technical documentation, including integration guides and More ❯
improvement of our Azure-based infrastructure using Infrastructure-as-Code (Terraform). Assist in developing and maintaining CI/CD pipelines with Azure DevOps. Contribute to platform monitoring and incidentresponse, with opportunities to grow skills in security and automation Collaborate with developers and security teams to promote a secure software development lifecycle. Participate in the on-call More ❯