ensure the reliability, availability, and scalability of the systems, platforms, and technology through them. Accountabilities Availability, performance, and scalability of systems and services through proactivemonitoring, maintenance, and capacity planning. Resolution, analysis and response to system outages and disruptions, and implement measures to prevent similar incidents from recurring. … Development of tools and scripts to automate operational processes, reducing manual workload, increasing efficiency, and improving system resilience. Monitoring and optimisation of system performance and resource usage, identify and address bottlenecks, and implement best practices for performance tuning. Collaboration with development teams to integrate best practices for reliability, scalability More ❯
ensuring fast and reliable software delivery. · Manage containerized applications using Docker, Kubernetes, Amazon EKS, and Helm. · Administer and enhance observability using log aggregation and monitoring tools such as CloudWatch, Splunk, and Datadog. · Maintain and manage artifact repositories (e.g., JFrog Artifactory) and ensure effective dependency management. · Automate and streamline system … technical goals. · Advocate for and implement best practices in DevOps, Site Reliability Engineering (SRE), and Software Engineering. · Ensure infrastructure security, scalability, and resilience through proactivemonitoring, patching, and maintenance. · Contribute to knowledge sharing and mentoring of junior team members on DevOps practices and tools. Requirements · software engineering experience … with AWS cloud services and infrastructure management. AWS certifications are advantageous. · Strong experience with Infrastructure as Code tools (Terraform, CloudFormation) · Familiarity with observability and monitoring tools (CloudWatch, Splunk, Datadog). · Experience managing CI/CD workflows, especially with GitHub Actions. · Strong knowledge of artifact repository management systems like JFrog. More ❯
Define and implement reliability standards. Develop and improve incident management processes in alignment with engineering support, ensuring effective resolution and root cause analysis. Drive proactivemonitoring, alerting, and automation to minimize downtime and improve system reliability. Lead efforts to eliminate single points of failure. Collaborate with DevOps practice … in a leadership or managerial position. Strong knowledge of cloud platforms (AWS, GCP, Azure) and modern infrastructure technologies (Kubernetes, Docker, Terraform). Expertise in monitoring, logging, and observability tools (e.g., Prometheus, Grafana, Datadog, Splunk). Proficiency in at least one programming or scripting language (e.g., Python, Go, Bash). More ❯
London, England, United Kingdom Hybrid / WFH Options
Littlepay
Product, Engineering and Project teams to diagnose and resolve customer and partner issues Lead cross-functional initiatives to optimize support processes and tools and proactivemonitoring to improve efficiency and reliability Collaborate and advocate with Product and Engineering to proactively improve the platform based on user feedback Create … and FAQs Engage with users directly to gather information and provide clear, concise explanations Work with Engineering to design, implement, and manage a comprehensive monitoring framework to proactively identify and address potential issues Drive initiatives and implement strategies to ensure consistently high levels of customer satisfaction, exceeding service level More ❯
Capita teams Provide out of hours support via 24/7 rota as required. Managing and resolving incidents and service requests raised via both proactiveMonitoring platforms and directly from customers within contractual SLA. About You Technically skilled to Cisco CCNP/Juniper JCIP level and holds/… around problems, creating innovative solutions to ultimately improve customer performance Be able to identify, design, implement and document network improvements Excellent knowledge of network monitoring systems, protocols and techniques. Excellent communication skills Experience of managing major Incidents through to resolution and providing root cause analysis Able to be self More ❯
London, England, United Kingdom Hybrid / WFH Options
BBC Group and Public Services
YOUR KEY RESPONSIBILITIES AND IMPACT Lead and Develop a High-Performing DevOps Engineers – recruit and manage a multidisciplinary team responsible for Automation, Observability/Monitoring, Security & Compliance Automation, CI/CD Pipeline, Reliability/Resilience, FinOps, Root Cause/Incident Response, Dashboarding/Reporting and 24/7 Runbook … and consistency across centralised and federated engineering teams. Champion Reliability Engineering and Incident Response - build and embed best practices in site reliability engineering, including proactivemonitoring, incident detection, root cause analysis, and continuous improvement to minimise downtime and user impact. Enhance Observability and Operational Visibility - oversee the design … implementation, and evolution of monitoring, alerting, dashboarding, and reporting capabilities that provide actionable insights and enable rapid response to issues. Embed Security, Compliance, and Cost Management - ensure security and compliance are integral to all DevOps practices. Collaborate with relevant teams to automate compliance checks and manage cloud/platform More ❯
response. KEY RESPONSIBILITIES AND IMPACT Lead and develop a high-performing DevOps Engineers-recruit and manage a multidisciplinary team responsible for automation, observability/monitoring, security & compliance automation, CI/CD pipelines, reliability/resilience, FinOps, root cause/incident response, dashboarding/reporting, and 24/7 runbook … velocity, reliability, and consistency across centralised and federated engineering teams. Champion reliability engineering and incident response-embed best practices in site reliability engineering, including proactivemonitoring, incident detection, root cause analysis, and continuous improvement to minimize downtime and user impact. Enhance observability and operational visibility-oversee the design … implementation, and evolution of monitoring, alerting, dashboarding, and reporting capabilities that provide actionable insights and enable rapid response to issues. Embed security, compliance, and cost management-ensure security and compliance are integral to all DevOps practices. Collaborate with relevant teams to automate compliance checks and manage cloud/platform More ❯
to Arista, ensuring minimal downtime in production environments. Collaborate with global teams to align network delivery with broader CI/CD pipelines, and implement proactivemonitoring and remediation frameworks. Provide senior-level L3 support, mentor junior engineers, and drive best practices in operational excellence. Essential Skills & Experience Strong … Arista. Exposure to network security platforms (Palo Alto, Fortinet, F5). Familiarity with CI/CD pipelines and infrastructure-as-code principles. Experience with monitoring tools (SolarWinds, PRTG, Logic Monitor) and SNOW for incident/change management. Relevant certifications (CCNP R&S, CCNP DC, PCNSA, ACI Specialist, or CCIE More ❯
to Arista, ensuring minimal downtime in production environments. Collaborate with global teams to align network delivery with broader CI/CD pipelines, and implement proactivemonitoring and remediation frameworks. Provide senior-level L3 support, mentor junior engineers, and drive best practices in operational excellence. Essential Skills & Experience Strong … Arista. Exposure to network security platforms (Palo Alto, Fortinet, F5). Familiarity with CI/CD pipelines and infrastructure-as-code principles. Experience with monitoring tools (SolarWinds, PRTG, Logic Monitor) and SNOW for incident/change management. Relevant certifications (CCNP R&S, CCNP DC, PCNSA, ACI Specialist, or CCIE More ❯
Define and implement reliability standards. Develop and improve incident management processes in alignment with engineering support, ensuring effective resolution and root cause analysis. Drive proactivemonitoring, alerting, and automation to minimize downtime and improve system reliability. Lead efforts to eliminate single points of failure. Collaborate with DevOps practice … in a mentorship or managerial position. Strong knowledge of cloud platforms (AWS, GCP, Azure) and modern infrastructure technologies (Kubernetes, Docker, Terraform). Expertise in monitoring, logging, and observability tools (e.g., Prometheus, Grafana, Datadog, Splunk). Proficiency in at least one programming or scripting language (e.g., Python, Go, Bash). More ❯
London, England, United Kingdom Hybrid / WFH Options
Genius Sports Group
presenting information to clients as directed. Prepare for all internal client meetings with relevant activation updates across assigned accounts. Pacing, Performance Reporting & Delivery management - Proactivemonitoring of performance. Work with Customer Success & Ad Ops to monitor and improve performance over time. Responsible for performance and pacing updates in … and Ad Serving technologies Excellent communication and organisational skills Strong analytical skills and uncompromising attention to detail Ability to clearly prioritize work in a proactive manner and remain flexible in a changing environment A keen interest in digital media and ad tech. This role is eligible for hybrid working More ❯
will be notified whenever a new position that matches your criteria becomes available. Overview: Responsibilities: Ensure contractual service levels are met or exceeded, with proactivemonitoring and issue management. Lead regular service reviews, providing performance reporting and driving actions for service improvement. Act as the primary operational contact … contract management, including cost management, budget forecasting, and profitability analysis. Communicate effectively with customers, internal teams, and stakeholders through clear reporting, service reviews, and proactive dialogue. Growth Mindset - Demonstrate a growth mindset by embracing feedback, seeking continuous improvement, and proactively developing new skills to adapt to evolving customer and More ❯
will be notified whenever a new position that matches your criteria becomes available. Overview: Responsibilities: Ensure contractual service levels are met or exceeded, with proactivemonitoring and issue management. Lead regular service reviews, providing performance reporting and driving actions for service improvement. Act as the primary operational contact … contract management, including cost management, budget forecasting, and profitability analysis. Communicate effectively with customers, internal teams, and stakeholders through clear reporting, service reviews, and proactive dialogue. Growth Mindset - Demonstrate a growth mindset by embracing feedback, seeking continuous improvement, and proactively developing new skills to adapt to evolving customer and More ❯
will be notified whenever a new position that matches your criteria becomes available. Overview Responsibilities: Ensure contractual service levels are met or exceeded, with proactivemonitoring and issue management. Lead regular service reviews, providing performance reporting and driving actions for service improvement. Act as the primary operational contact … contract management, including cost management, budget forecasting, and profitability analysis. Communicate effectively with customers, internal teams, and stakeholders through clear reporting, service reviews, and proactive dialogue. Growth Mindset - Demonstrate a growth mindset by embracing feedback, seeking continuous improvement, and proactively developing new skills to adapt to evolving customer and More ❯
role ensures customer satisfaction, SLA adherence, and alignment between service operations and business needs. Responsibilities Ensure contractual service levels are met or exceeded, with proactivemonitoring and issue management. Lead regular service reviews, providing performance reporting and driving actions for service improvement. Act as the primary operational contact … contract management, including cost management, budget forecasting, and profitability analysis. Communicate effectively with customers, internal teams, and stakeholders through clear reporting, service reviews, and proactive dialogue. Growth Mindset - Demonstrate a growth mindset by embracing feedback, seeking continuous improvement, and proactively developing new skills to adapt to evolving customer and More ❯
feel free to reach out and apply today! Key Responsibilities: Oversee and maintain a low-latency, global network infrastructure Provide high-level support and proactivemonitoring of mission-critical systems Take ownership of network improvement projects from design through to deployment Collaborate closely with cross-functional teams to More ❯
Purpose: Apply software engineering techniques, automation, and best practices to ensure system reliability, availability, and scalability. Responsibilities: Ensure system performance, scalability, and availability through proactivemonitoring and capacity planning. Respond to system outages, analyze issues, and implement preventive measures. Develop automation tools and scripts to improve operational efficiency More ❯
system integrations between ServiceNow and other applications using REST API. Strong analysis, problem-solving, and decision-making skills. Ability to debug issues and implement proactivemonitoring for platform stability. Experience mentoring junior developers and fostering a collaborative team environment. Excellent communication skills to clarify requirements with stakeholders and More ❯
platform Unit testing, integration testing. Contributing to maintaining and constantly improving the CI/CD pipeline Code reviews, design reviews Code instrumentation, setting up proactivemonitoring dashboards Liaising with users and subject matter experts in order to gather requirements, analyse solutions and triage feedback and incidents Understanding business More ❯
London, England, United Kingdom Hybrid / WFH Options
BASE Media Cloud Limited
2nd and 3rd line engineering support in collaboration with internal teams and vendor partners Analyze infrastructure and system logs, troubleshoot technical issues, and implement proactivemonitoring strategies Contribute to service documentation, including architecture diagrams and technical runbooks Collaboration & Innovation: Assist software developers with CI/CD environments and More ❯
London, England, United Kingdom Hybrid / WFH Options
PA Consulting
more of the Consulting teams. The role requires commercial awareness ensuring timely invoicing and a good working relationship with Credit Control. Key elements include: • Proactivemonitoring of the weekly WIP and debt position of the jobs they look after and offer support when invoices need to be sent. More ❯
Bristol, England, United Kingdom Hybrid / WFH Options
Astro Studios, Inc
more of the Consulting teams. The role requires commercial awareness ensuring timely invoicing and a good working relationship with Credit Control. Key elements include: • Proactivemonitoring of the weekly WIP and debt position of the jobs they look after and offer support when invoices need to be sent. More ❯
Wigan, England, United Kingdom Hybrid / WFH Options
Evolve BG Ltd
Works effectively with colleagues, shares knowledge, and fosters a positive team culture. Understanding of managed service, with the ability to articulate the value of proactivemonitoring maintenance and support. Cyber security awareness and how MSP’s can protect Business. Cloud computing and Network Infrastructure Knowledge. Why Evolve? At More ❯
Portsmouth, England, United Kingdom Hybrid / WFH Options
BAE Systems (New)
succeed. IT Operations – Responsible for managing the ongoing BAU support of the platform in operation environments, including change and incident management, service requests and proactivemonitoring and maintenance. Security – Ensure security is considered early in the development lifecycle and constantly maintained. DevOps engineers must understand how to secure More ❯