Observability Jobs in the UK

Employment Type

Remote Jobs

Hybrid/WFH 1,147

Sort By

Relevance
Date

Locations

Job Titles

Site Reliability Engineer

London, England, United Kingdom

numi

Exciting You’ll work in a Node.js-first environment where product and platform teams collaborate closely. You’ll own core infrastructure and DevOps processes, from CI/CD to observability . You’ll be part of a team that encourages experimentation, autonomy, and continuous improvement . You'll help shape the SRE function at a high-impact stage of growth. … Doing Build and improve CI/CD pipelines (GitHub Actions) that keep development smooth and fast Maintain and scale infrastructure on AWS , including ECS, S3, RDS, and CloudFront Improve observability using tools like Datadog and CloudWatch — and act on what you find Automate key workflows around deployment, testing, scaling, and failure recovery Collaborate with engineers to build scalable, secure, and … For Strong experience working in production Node.js environments Hands-on with AWS services and container orchestration (ECS, Docker) Skilled at building and maintaining CI/CD pipelines Experience with observability, monitoring , and incident management Working knowledge of infrastructure-as-code (Terraform, CloudFormation) A collaborative, proactive mindset with strong communication skills What You’ll Get A collaborative, mission-driven culture that More ❯

Posted: 2 days ago

Head of Infrastructure

London Area, United Kingdom
Hybrid / WFH Options

Intec Select

with organisational goals. Ensure all services are secure by design, working closely with the information security team to proactively manage risks. Drive service improvement and operational resilience through automation, observability, and DevOps best practices. Experience Required: Proven experience in leading platform/infrastructure and DevOps teams in a hands-on capacity. Strong technical foundation in both traditional infrastructure and modern … CI/CD, GitOps, IaC (e.g., Terraform, ARM), and automation scripting (e.g., PowerShell, Bash, Python). Cloud experience (ideally Azure) and hybrid infrastructure environments. Familiarity with monitoring, alerting, and observability platforms. Package: £100,000 - £120,000 Basic Salary Up to 25% Bonus 15% Pension Remote Working Head of Platform & Infrastructure Engineering – Financial Services- London (Hybrid/Remote More ❯

Posted: 3 days ago

Head of Infrastructure

City of London, London, United Kingdom
Hybrid / WFH Options

Intec Select

Posted: 3 days ago

Head of Platform & Infrastructure

London, England, United Kingdom
Hybrid / WFH Options

ZipRecruiter

Posted: 2 days ago

Principal Backend Engineer, Grafana (Remote, UK)

London, England, United Kingdom
Hybrid / WFH Options

Grafana Labs

required: Yes col-narrow-right Job Reference: 482f0f463755 Job Views: 32 Posted: 22.06.2025 Expiry Date: 06.08.2025 col-wide Job Description: What is Grafana Cloud? Grafana Cloud is our composable observability platform that integrates visualizations on metrics, logs, and traces with Grafana. It allows our customers to leverage the best open source observability software – including Prometheus, Mimir, Loki, and Tempo – without … the overhead of installing, maintaining and scaling their own observability stack The Grafana team within engineering is responsible for Grafana, the highly successful open source project with over a million instances running in the wild as well as our Enterprise ready Grafana Enterprise offering. Grafana is also the main frontend for Grafana Cloud where users can visualize their telemetry data More ❯

Posted: 4 days ago

Infrastructure Engineer

City of London, London, United Kingdom

Coram AI

provisioning and management across hundreds of thousands of connected IoT devices deployed in the field Building CI and CD and automation pipelines for various parts of the stack Building observability and telemetry Helping maintain compliance with various security standards (SOC2, HIPAA ...) Maximising developer productivity by streamlining development workflows This is an onsite position based in London, UK Requirements and … Kubernetes (particularly EKS) 3+ years of experience with either Python or Go Building CI/CD pipelines and automation of various parts of the stack Self-hosting and maintaining observability tools such as Grafana/Prometheus It would be great if you also have experience with one or more Edge/IoT infrastructure (Yocto, IoT devices provisioning, over-the-air More ❯

Posted: Today

Infrastructure Engineer

London Area, United Kingdom

Coram AI

Posted: Today

Senior AWS Platform Engineer

London, England, United Kingdom
Hybrid / WFH Options

Identity E2E Ltd

at scale, leveraging AWS Organizations, Landing Zones, and multi-account best practices. Develop and maintain Infrastructure as Code solutions using Terraform, CloudFormation, and AWS CDK. Champion security, compliance, and observability by integrating services like AWS Security Hub, GuardDuty, and Inspector. Design CI/CD pipelines to enable seamless deployments and self-service models for customers. Innovate with AWS Networking, KMS … Proficiency in Python, Go, or similar languages for automation and scripting. Expert-level knowledge of AWS Networking, TLS, and security best practices. Experience with container orchestration (Kubernetes, EKS) and observability tools (Grafana, ELK). A passion for innovation, problem-solving, and delivering high-impact solutions. Why Work For Us? Competitive base salary £90,000-£100,000 25 days holiday + More ❯

Posted: 2 days ago

Site Reliability Engineer II

Glasgow, Scotland, United Kingdom

ZipRecruiter

and resolving incidents, working with others to address root causes. Recognize toil within your role and proactively work towards eliminating it through systems engineering or application code updates. Understand observability patterns and strive to implement and improve service level indicators, objectives, monitoring, and alerting solutions for optimal transparency and analysis. Required qualifications, capabilities, and skills Formal training or certification in … experience. Proficiency in at least one programming language such as Python or Java. Experience maintaining a cloud-based infrastructure. Familiarity with site reliability principles, concepts, and practices. Knowledge of observability tools such as Grafana, Dynatrace, Prometheus, Datadog, Splunk, New Relic, CloudWatch, or AppDynamics. Familiarity with containers or common server operating systems like Linux and Windows. Emerging knowledge of software, applications More ❯

Posted: 2 days ago

Director of Platform Engineering

London, United Kingdom

dunnhumby

fosters innovation, and delivers exceptional user interactions delivering robust internal developer platform (IDP) capabilities, strengthening CI/CD pipelines, enabling on-demand environments, and scaling platform foundations such as observability, security, and FinOps - while adhering to best practices in DevOps and modern software delivery. What we expect from you Drive the development of a comprehensive IDP (e.g., based on Backstage … on-demand environments for development, QA, and staging through Infrastructure-as-Code and container orchestration. Support multi-tenancy and environment rationalization to reduce duplication and inefficiency. Define and implement observability standards, including logging, metrics, tracing, and alerting . Use tools like New Relic , Prometheus , and Grafana , alongside building custom instrumentation for key platform services. Drive incident readiness and operational resilience … tools. Proven success in building and operating developer platforms and enablement frameworks. Experience with cloud-native technologies, Kubernetes, and Infrastructure as Code (Terraform, Helm, etc.). Strong understanding of observability tooling (especially New Relic, Prometheus, Grafana) and incident response best practices. Familiarity with FinOps, platform cost tracking, and infrastructure efficiency techniques. Excellent communication, leadership, and stakeholder management skills. Attract, hire More ❯

Employment Type: Permanent

Salary: GBP Annual

Posted: 27 days ago

Global IT Network Senior Director

London, England, United Kingdom

The Boston Consulting Group GmbH

networking (SDN), and AI-driven automation. Ensureend-to-end network automationto improve operational efficiency, agility, and reliability. Drivezero-trust network securityprinciples, ensuring compliance and proactive threat mitigation. Establish aglobal observability and telemetry frameworkforreal-time network insights. Align network strategies withbusiness growth, cloud-first initiatives, and digital transformation. Overseeglobal network architecture, spanningdata centers, cloud environments, and enterprise connectivity. Leadnetwork modernization efforts … Implementreal-time incident detection and responseusing AI-driven network analytics. Ensurehigh availability, network resilience, and 24x7 operational support. Develop afollow-the-sun support model, ensuringglobal network performance optimization. Implementnetwork observability and predictive analyticstoproactively prevent outages. Security, Compliance & Risk Management: Drivezero-trust security frameworks, ensuringsecure and resilient network access. Ensure adherence toISO 27001, NIST, SOC 2, GDPR, and industry best practices. … a senior leadership role, managinglarge-scale global network environments. Deep expertise incloud networking (AWS, Azure, GCP), SD-WAN, and network automation. Proven track record inend-to-end network automation, observability, and self-healing networks. Experience inAI-driven networking, predictive analytics, and network telemetry. Strong understanding ofzero-trust networking, compliance frameworks, and security policies. Excellent leadership, communication, and stakeholder management skills. More ❯

Posted: Yesterday

AWS Senior Platform Engineer

Bristol, Gloucestershire, United Kingdom

CACI Limited

at scale, leveraging AWS Organizations, Landing Zones, and multi-account best practices. Develop and maintain Infrastructure as Code solutions using Terraform, CloudFormation, and AWS CDK. Champion security, compliance, and observability by integrating services like AWS Security Hub, GuardDuty, and Inspector. Design CI/CD pipelines to enable seamless deployments and self-service models for customers. Innovate with AWS Networking, KMS … Proficiency in Python, Go, or similar languages for automation and scripting. Expert-level knowledge of AWS Networking, TLS, and security best practices. Experience with container orchestration (Kubernetes, EKS) and observability tools (Grafana, ELK). A passion for innovation, problem-solving, and delivering high-impact solutions. Why Work For Us? 25 days holiday + bank holidays Up to 5% employer pension More ❯

Employment Type: Permanent

Salary: GBP Annual

Posted: 13 days ago

Platform Engineer

Caldecotte, Milton Keynes, Buckinghamshire, England, United Kingdom

Connells Group HQ

mindset, working directly with development teams to understand their needs and deliver solutions. You will work across multiple technical domains including orchestration, automation, CI/CD pipelines, cloud services, observability, and security, developing deeper expertise in areas that align with platform priorities and your interests. Experience with Microsoft Azure is essential.You will play your part in operating the platform aligned … with Docker and basic Kubernetes concepts Understanding of cloud networking concepts (VNets, subnets, NSGs) Awareness of cloud security best practices and compliance requirements Basic knowledge of monitoring, logging, and observability tools Understanding of cloud cost management and resource optimisation principles Comfort with troubleshooting and supporting development teams Understanding of service reliability and incident response practices Connells Group UK is an More ❯

Employment Type: Full-Time

Salary: Competitive salary

Posted: 26 days ago

Director of Platform Engineering

Manchester, Lancashire, United Kingdom

dunnhumby

fosters innovation, and delivers exceptional user interactions delivering robust internal developer platform (IDP) capabilities, strengthening CI/CD pipelines, enabling on-demand environments, and scaling platform foundations such as observability, security, and FinOps - while adhering to best practices in DevOps and modern software delivery What we expect from you Drive the development of a comprehensive IDP (e.g., based on Backstage … on-demand environments for development, QA, and staging through Infrastructure-as-Code and container orchestration. Support multi-tenancy and environment rationalization to reduce duplication and inefficiency. Define and implement observability standards, including logging, metrics, tracing, and alerting . Use tools like New Relic , Prometheus , and Grafana , alongside building custom instrumentation for key platform services. Drive incident readiness and operational resilience … tools. Proven success in building and operating developer platforms and enablement frameworks. Experience with cloud-native technologies, Kubernetes, and Infrastructure as Code (Terraform, Helm, etc.). Strong understanding of observability tooling (especially New Relic, Prometheus, Grafana) and incident response best practices. Familiarity with FinOps, platform cost tracking, and infrastructure efficiency techniques. Excellent communication, leadership, and stakeholder management skills. Attract, hire More ❯

Employment Type: Permanent

Salary: GBP Annual

Posted: 6 days ago

Director of Platform Engineering

Manchester, England, United Kingdom

dunnhumby

fosters innovation, and delivers exceptional user interactions delivering robust internal developer platform (IDP) capabilities, strengthening CI/CD pipelines, enabling on-demand environments, and scaling platform foundations such as observability, security, and FinOps - while adhering to best practices in DevOps and modern software delivery What we expect from you Drive the development of a comprehensive IDP (e.g., based on Backstage … on-demand environments for development, QA, and staging through Infrastructure-as-Code and container orchestration. Support multi-tenancy and environment rationalization to reduce duplication and inefficiency. Define and implement observability standards, including logging, metrics, tracing, and alerting . Use tools like New Relic , Prometheus , and Grafana , alongside building custom instrumentation for key platform services. Drive incident readiness and operational resilience … tools. Proven success in building and operating developer platforms and enablement frameworks. Experience with cloud-native technologies, Kubernetes, and Infrastructure as Code (Terraform, Helm, etc.). Strong understanding of observability tooling (especially New Relic, Prometheus, Grafana) and incident response best practices. Familiarity with FinOps, platform cost tracking, and infrastructure efficiency techniques. Excellent communication, leadership, and stakeholder management skills. Attract, hire More ❯

Posted: 2 days ago

Data Engineer London, Singapore

London, United Kingdom

GSR Markets Limited

Monitor, troubleshoot, and optimize data pipelines to ensure performance and cost efficiency. Implement data governance, access controls, and security measures in line with best practices and regulatory standards. Develop observability and anomaly detection tools to support Tier 1 systems. Work with engineers and business teams to gather requirements and translate them into technical solutions. Maintain documentation, follow coding standards, and … to work across technical and non-technical teams. Additional Strengths Experience with orchestration tools like Apache Airflow. Knowledge of real-time data processing and event-driven architectures. Familiarity with observability tools and anomaly detection for production systems. Exposure to data visualization platforms such as Tableau or Looker. Relevant cloud or data engineering certifications. What we offer: A collaborative and transparent … ELT workflows with Apache Airflow (or similar) and integrating them into containerised CI/CD pipelines (Docker, GitHub Actions, Jenkins, etc.)? Select Which option best describes your experience building observability and automated anomaly detection tooling for data pipelines? Select What best describes your current location and working rights status? Select By submitting your application, you confirm that you have read More ❯

Employment Type: Permanent

Salary: GBP Annual

Posted: 12 days ago

Linux & Storage Engineer – Trading - FinTech - £120,000-£220,000 + Bonus

City of London, London, United Kingdom
Hybrid / WFH Options

Hunter Bond

with either Chef or Ansible for configuration management GPFS, NFS, Weka etc. Some experience working in either Python, Go or Rust Familiarity with CI/CD and Agile practices Observability – ELK, Prometheus, Grafana Degree in relevant subject highly desirable Please apply ASAP for more information. More ❯

Posted: Today

Linux & Storage Engineer – Trading - FinTech - £120,000-£220,000 + Bonus

London Area, United Kingdom
Hybrid / WFH Options

Hunter Bond

Posted: Today

Systems Developer – E-Commerce Integrations (Cloud-Native, AI-Driven)

Liverpool, England, United Kingdom

Protein Works

and refine queue-based processing to support asynchronous workflows and event-driven architecture. Work collaboratively with cross-functional teams, including DevOps, Infrastructure, and Product, to deliver robust systems. Leverage observability tools to monitor, alert, and troubleshoot application and integration health. Stay current on AI-driven software development practices (e.g., GPT-assisted development, Agentic AI workflows) and suggest practical implementations. Participate … Prior experience building middleware for data sync, order processing, and internal APIs in a multi-system e-commerce environment Understanding of architecture patterns: Microservices , SOA , Hexagonal , Modular Monolith Monitoring & Observability: Solid grasp of AI trends in software development , particularly in using GPT tools and agentic systems Education: Mathematics or Computer Science degree (or equivalent experience) Working knowledge of VB.NET Exposure More ❯

Posted: Today

DevOps Manager

Bath, England, United Kingdom
Hybrid / WFH Options

JR United Kingdom

availability, and throughput, aligning internal goals with platform performance. Promote and embed Site Reliability Engineering (SRE) practices to improve stability, monitoring, and response. Manage a growing toolset for orchestration, observability, and automation. Partner closely with Engineering, Delivery, and Architecture teams to ensure seamless integration and interoperability across the stack. Drive platform innovation by tracking emerging technologies, tools, and vendor solutions. … Boards, and Azure Repos. Knowledge of Azure infrastructure as code (IaC) tools like Terraform, ARM templates, or Azure CL Deep experience with platform tooling — including IaC, APIs, automation, and observability frameworks. Proven ability to design, build, and scale robust platform services across complex, regulated environments. Experienced in cloud/hybrid environments with a clear understanding of platform interconnectivity, scalability, and More ❯

Posted: 2 days ago

DevOps Manager

Newport, Wales, United Kingdom
Hybrid / WFH Options

JR United Kingdom

Posted: 2 days ago

DevOps Manager

Cheltenham, England, United Kingdom
Hybrid / WFH Options

JR United Kingdom

Posted: 2 days ago

Site Reliability Engineer

Peterborough, England, United Kingdom
Hybrid / WFH Options

Compare the Market

the uptime and reliability of critical systems and applications, minimizing downtime and service disruptions. • Automation - Develop and maintain automated processes for deployment, configuration, and scaling of infrastructure and applications. • Observability - Ensure that teams and their services are making the most of our observability stack and that relevant information is accessible to them to effectively manage their estate. • Incident Response - Respond More ❯

Posted: 2 days ago

Senior Site Reliability Engineer

London, England, United Kingdom

platforms and services. We are looking for someone who thrives at the intersection of infrastructure and software development. This team will work very closely with the Compute, Traffic, and Observability infrastructure teams. They will own a suite of tools for allowing engineers to understand their creations, based primarily on open-source solutions at scale. We’re active users of and … foundational Infrastructure and Platform services, which are used by Reddit engineering teams to build, deploy, and operate Reddit. Deliver software to improve the availability, scalability, latency, and efficiency of observability components. Identify and engineer away risk across Reddit’s systems. Automate : Take repetitive, manual, or risky tasks and automate them out of existence. Build tools and integrate systems to support More ❯

Posted: 2 days ago

Lead Infrastructure Architect

London, England, United Kingdom

ZipRecruiter

domains. With over 20+ years of proven expertise, the ideal candidate will shape the strategy, design, and transformation of complex infrastructure landscapes—including Wintel, Linux, Network, Voice, Collaboration, Mobility, Observability, End-User Computing, End-User Services, and Service Desk. This role acts as a key advisor to senior leadership and ensures that infrastructure investments align with organizational goals, operational resilience … domains: Wintel & Linux platforms Network (LAN/WAN/SD-WAN, Wireless, Firewalls) Unified Communication/Voice/Collaboration (Cisco, MS Teams) Mobility & Endpoint Management (Intune, MDM/UEM) Observability and Monitoring (ELK, Prometheus, AppDynamics, etc.) End-User Computing (VDI, physical endpoints, OS lifecycle) End-User Services and Service Desk (ITSM, automation, FCR, CSAT) Serve as a trusted advisor to More ❯

Posted: 2 days ago

35 36 373839 40 41

Salary Guide

Observability

10th Percentile: £57,500
25th Percentile: £65,000
Median: £80,000
75th Percentile: £97,500
90th Percentile: £117,875

More Observability insights »

926 to 950 of 2,538 Observability Jobs in the UK