Observability Jobs in the UK

901 to 925 of 2,526 Observability Jobs in the UK

Senior Principal Network Architect

London, England, United Kingdom
Hybrid / WFH Options
Equinix
networking technologies and ecosystems, such as Routing Daemons (FRR, Bird, GoBGP), Linux Networking (eBPF, VPP, XDP), and SONiC, or other Linux-based open Network Operating Systems Involvement with modern observability platforms (Prometheus/PromQL, Grafana, gNMI, etc) Experience with network flow export (Netflow, IPFIX, sFlow) and analysis Solid understanding of full networking stack (routing, switching and optical networking), including key More ❯
Posted:

Staff and Team Lead, Onyx Application Engineering

London, England, United Kingdom
Hybrid / WFH Options
GlaxoSmithKline
Staff and Team Lead, Onyx Application Engineering The Onyx Research Data Tech organization represents a major investment by GSK R&D and Digital & Tech, designed to deliver a step-change in our ability to leverage data, knowledge, and prediction to More ❯
Posted:

Head of Site Reliability Engineering & Platform

London, England, United Kingdom
Hybrid / WFH Options
DeepL
Head of Site Reliability Engineering & Platform Join to apply for the Head of Site Reliability Engineering & Platform role at DeepL Head of Site Reliability Engineering & Platform Join to apply for the Head of Site Reliability Engineering & Platform role at DeepL More ❯
Posted:

Senior Solution Architect

London, England, United Kingdom
EDB
Join to apply for the Senior Solution Architect role at EDB 4 days ago Be among the first 25 applicants Join to apply for the Senior Solution Architect role at EDB Get AI-powered advice on this job and more More ❯
Posted:

DevOps Specialist

Royal Leamington Spa, England, United Kingdom
Hybrid / WFH Options
Tata Consultancy Services
If you need support in completing the application or if you require a different format of this document, please get in touch with at UKI.recruitment@tcs.com or call TCS London Office number 02031552100/+44 204 520 2575 with the More ❯
Posted:

Observability Specialist – Grafana / Golang

London, England, United Kingdom
Hybrid / WFH Options
JR United Kingdom
Observability Specialist – Grafana/Golang, London Client: Location: London, United Kingdom Job Category: Other EU work permit required: Yes Job Views: 4 Posted: 26.06.2025 Expiry Date: 10.08.2025 Job Description: A financial markets firm is implementing observability across their infrastructure estate and has an opportunity for a Grafana specialist to play a key role in building high-quality dashboards and visualisations. … infrastructure as code pipelines into key metrics and actionable insights. This will involve working with engineers on requirements, producing iterative designs, and developing supporting tools. Requirements: 5+ years of observability experience in automated infrastructure environments Strong query experience in PromQL, VictoriaMetrics, or VictoriaLogs Tool development experience in Golang or Python Understanding of infrastructure as code outputs and tools (Terraform) Linux More ❯
Posted:

Senior Site Reliability Engineer - Monitoring and Observability

London, England, United Kingdom
Macquarie Group
Senior Site Reliability Engineer - Monitoring and Observability Join to apply for the Senior Site Reliability Engineer - Monitoring and Observability role at Macquarie Group Senior Site Reliability Engineer - Monitoring and Observability Join to apply for the Senior Site Reliability Engineer - Monitoring and Observability role at Macquarie Group Get AI-powered advice on this job and more exclusive features. Our team is … dedicated to running and uplifting the current environment to the NextGen IT Monitoring and Observability stage. We run and maintain enterprise-wide log analytics, monitoring, and observability services, ensuring optimal performance and customer satisfaction. At Macquarie, our advantage is bringing together diverse people and empowering them to shape all kinds of possibilities. We are a global financial services group operating … ll be part of a friendly and supportive team where everyone - no matter what role - contributes ideas and drives outcomes. What role will you play? As a Monitoring and Observability Engineer, you will run and maintain enterprise-wide log analytics, monitoring, and observability services. You will be responsible for improving the value provided by the log analytics platform to drive More ❯
Posted:

Senior Site Reliability Engineer - Monitoring and Observability | London, UK

London, England, United Kingdom
Macquarie Group
Senior Site Reliability Engineer - Monitoring and Observability Our team is dedicated to running and uplifting the current environment to the NextGen IT Monitoring and Observability stage. We run and maintain enterprise-wide log analytics, monitoring, and observability services, ensuring optimal performance and customer satisfaction. At Macquarie, our advantage is bringing together diverse people and empowering them to shape all kinds … ll be part of a friendly and supportive team where everyone - no matter what role - contributes ideas and drives outcomes. What role will you play? As a Monitoring and Observability Engineer, you will run and maintain enterprise-wide log analytics, monitoring, and observability services. You will be responsible for improving the value provided by the log analytics platform to drive More ❯
Posted:

Site Reliability Engineer Lead

London Area, United Kingdom
Hybrid / WFH Options
Cpl
Site Reliability Engineer (SRE) Lead – Observability Rate: £450-£475 per day (Inside IR35) Location: London (Hybrid, 2 days on site per week) Contract Role Overview: Join a high-impact team where you'll lead and shape the SRE and Observability function for a major transformation programme. This role goes beyond traditional SRE – you’ll champion best practices across product teams … drive observability strategy, and work hands-on with cutting-edge tools like Datadog and AWS. Key Responsibilities: Lead the SRE function and promote observability-first thinking across development and operations teams. Define and implement the observability roadmap across product domains in collaboration with the client. Be hands-on with Datadog for infrastructure and application-level monitoring. Guide and review daily … operations and improvements across observability platforms. Partner with engineering squads to deliver on observability requirements in an agile, demand-led way. Core Skills & Experience: Proven experience as a hands-on SRE Engineer. Deep understanding of observability and monitoring practices. Practical experience with Datadog (or similar observability platforms). Strong DevOps toolchain knowledge: GitHub, GitHub Actions, Jenkins, CodeQL, Nexus, CloudFormation, Terraform. More ❯
Posted:

Site Reliability Engineer Lead

City of London, London, United Kingdom
Hybrid / WFH Options
Cpl
Site Reliability Engineer (SRE) Lead – Observability Rate: £450-£475 per day (Inside IR35) Location: London (Hybrid, 2 days on site per week) Contract Role Overview: Join a high-impact team where you'll lead and shape the SRE and Observability function for a major transformation programme. This role goes beyond traditional SRE – you’ll champion best practices across product teams … drive observability strategy, and work hands-on with cutting-edge tools like Datadog and AWS. Key Responsibilities: Lead the SRE function and promote observability-first thinking across development and operations teams. Define and implement the observability roadmap across product domains in collaboration with the client. Be hands-on with Datadog for infrastructure and application-level monitoring. Guide and review daily … operations and improvements across observability platforms. Partner with engineering squads to deliver on observability requirements in an agile, demand-led way. Core Skills & Experience: Proven experience as a hands-on SRE Engineer. Deep understanding of observability and monitoring practices. Practical experience with Datadog (or similar observability platforms). Strong DevOps toolchain knowledge: GitHub, GitHub Actions, Jenkins, CodeQL, Nexus, CloudFormation, Terraform. More ❯
Posted:

Site Reliability Engineer Lead

City of London, England, United Kingdom
Hybrid / WFH Options
JR United Kingdom
Kingdom Job Category: Other - EU work permit required: Yes col-narrow-right Job Views: 3 Posted: 16.06.2025 Expiry Date: 31.07.2025 col-wide Job Description: Site Reliability Engineer (SRE) Lead – Observability Location: London (Hybrid, 2 days on site per week) Contract Role Overview: Join a high-impact team where you'll lead and shape the SRE and Observability function for a … major transformation programme. This role goes beyond traditional SRE – you’ll champion best practices across product teams, drive observability strategy, and work hands-on with cutting-edge tools like Datadog and AWS. Key Responsibilities: Lead the SRE function and promote observability-first thinking across development and operations teams. Define and implement the observability roadmap across product domains in collaboration with … the client. Be hands-on with Datadog for infrastructure and application-level monitoring. Guide and review daily operations and improvements across observability platforms. Partner with engineering squads to deliver on observability requirements in an agile, demand-led way. Core Skills & Experience: Proven experience as a hands-on SRE Engineer. Deep understanding of observability and monitoring practices. Practical experience with Datadog More ❯
Posted:

Kubernetes Lead Engineer – HPC Infrastructure

City of London, London, United Kingdom
Alexander Ash Consulting
robust container orchestration platform to support large-scale, compute-intensive workloads in a high-performance trading environment. You'll lead a team of engineers, define best practices, and ensure observability, scalability, and performance across the platform. Key Responsibilities Lead the design and operation of Kubernetes platforms (on-prem & cloud-native) Manage HPC infrastructure to support trading workloads and scientific compute … Guide a team of engineers across distributed environments Define and enforce best practices for infrastructure scalability, performance, and monitoring Implement observability tooling and ensure high platform availability Collaborate with other engineering teams to drive automation and operational efficiency Requirements 8+ years in infrastructure/platform engineering roles Deep expertise in Kubernetes (both on-premises and cloud-native) Strong Linux (preferably … RHEL) systems administration skills Proven experience with HPC workloads or scientific computing clusters Hands-on experience with observability tools : Prometheus, Grafana, Loki Infrastructure as Code (IaC) using Terraform , Ansible CI/CD experience with GitOps tools (e.g., ArgoCD, Flux) Prior experience leading engineering teams in distributed environments More ❯
Posted:

Kubernetes Lead Engineer – HPC Infrastructure

London Area, United Kingdom
Alexander Ash Consulting
robust container orchestration platform to support large-scale, compute-intensive workloads in a high-performance trading environment. You'll lead a team of engineers, define best practices, and ensure observability, scalability, and performance across the platform. Key Responsibilities Lead the design and operation of Kubernetes platforms (on-prem & cloud-native) Manage HPC infrastructure to support trading workloads and scientific compute … Guide a team of engineers across distributed environments Define and enforce best practices for infrastructure scalability, performance, and monitoring Implement observability tooling and ensure high platform availability Collaborate with other engineering teams to drive automation and operational efficiency Requirements 8+ years in infrastructure/platform engineering roles Deep expertise in Kubernetes (both on-premises and cloud-native) Strong Linux (preferably … RHEL) systems administration skills Proven experience with HPC workloads or scientific computing clusters Hands-on experience with observability tools : Prometheus, Grafana, Loki Infrastructure as Code (IaC) using Terraform , Ansible CI/CD experience with GitOps tools (e.g., ArgoCD, Flux) Prior experience leading engineering teams in distributed environments More ❯
Posted:

Solution Engineer

London, England, United Kingdom
Coralogix, inc
Coralogix is a modern, full-stack observability platform transforming how businesses process and understand their data. Our unique architecture powers in-stream analytics without reliance on expensive indexing or hot storage. We specialize in comprehensive monitoring of logs, metrics, trace and security events with features such as APM, RUM, SIEM, Kubernetes monitoring and more, all enhancing operational efficiency and reducing … observability spend by up to 70%. Solution Engineers in Coralogix are key in meeting our customers’ expectations and helping them utilize their observability and security data. We are looking for hard-working, sharp, and humble professionals with proven technical customer-facing experience. Our Solution Engineers are trusted advisors and consult our customers upon their monitoring, security & observability journey. Solution … Value Understand and communicate customer needs to the product teams for future product enhancements Build solutions to fill gaps and enhance the core product Know the Log Management/Observability markets well and be able to help customers choose the right solutions for them Requirements 5+ years in a customer-facing pre-sales, technical architecture, or consulting Strong communication and More ❯
Posted:

Kubernetes Lead Engineer – HPC Infrastructure

London, England, United Kingdom
JR United Kingdom
robust container orchestration platform to support large-scale, compute-intensive workloads in a high-performance trading environment. You'll lead a team of engineers, define best practices, and ensure observability, scalability, and performance across the platform. Key Responsibilities Lead the design and operation of Kubernetes platforms (on-prem & cloud-native) Manage HPC infrastructure to support trading workloads and scientific compute … Guide a team of engineers across distributed environments Define and enforce best practices for infrastructure scalability, performance, and monitoring Implement observability tooling and ensure high platform availability Collaborate with other engineering teams to drive automation and operational efficiency Requirements 8+ years in infrastructure/platform engineering roles Deep expertise in Kubernetes (both on-premises and cloud-native) Strong Linux (preferably … RHEL) systems administration skills Proven experience with HPC workloads or scientific computing clusters Hands-on experience with observability tools : Prometheus, Grafana, Loki Infrastructure as Code (IaC) using Terraform , Ansible CI/CD experience with GitOps tools (e.g., ArgoCD, Flux) Prior experience leading engineering teams in distributed environments #J-18808-Ljbffr More ❯
Posted:

Pre-Sales DevOps Engineer

London, United Kingdom
Clearwater People Solutions
teams to senior executives. Design and manage Proof of Concepts (POCs) and Proof of Value (POVs). Act as a technical advisor, helping customers select and implement the right observability and monitoring solutions. Communicate customer needs and product feedback to internal product and engineering teams. Create custom solutions to bridge gaps and maximize value for each client. Key skills for … Strong coding ability in at least one high-level programming language (e.g. Java, Go, Python). Deep technical knowledge of Kubernetes, AWS, Azure, GCP or Docker Solid understanding of observability tools, log management, APM, and SIEM. Experience in DevOps or engineering roles is a strong advantage. Background in technical sales or customer engagement within observability or security platforms is a More ❯
Employment Type: Permanent
Salary: £70000/annum 100,000 - 140,000 OTE
Posted:

Pre-Sales DevOps Engineer (German speaking)

London, United Kingdom
Clearwater People Solutions
teams to senior executives. Design and manage Proof of Concepts (POCs) and Proof of Value (POVs). Act as a technical advisor, helping customers select and implement the right observability and monitoring solutions. Communicate customer needs and product feedback to internal product and engineering teams. Create custom solutions to bridge gaps and maximize value for each client. Key skills for … Strong coding ability in at least one high-level programming language (e.g. Java, Go, Python). Deep technical knowledge of Kubernetes, AWS, Azure, GCP or Docker Solid understanding of observability tools, log management, APM, and SIEM. Experience in DevOps or engineering roles is a strong advantage. Background in technical sales or customer engagement within observability or security platforms is a More ❯
Employment Type: Permanent
Salary: £70000/annum 100,000 - 140,000 OTE
Posted:

Senior Linux System Administrator

London Area, United Kingdom
NineTech
Administer GitLab infrastructure for CI/CD processes. Operate and maintain Kafka clusters for real-time data pipelines. Diagnose and resolve issues across systems, networks, containers, and applications. Use observability tools (Grafana, Prometheus, Kibana, Elasticsearch) to monitor system health. Automate system management tasks using Ansible. Participate in an on-call rotation to support global operations. Required Skills & Experience: Strong hands … system optimization. Production-level experience managing Kubernetes clusters. Proficiency with GitLab for version control and CI/CD workflows. Solid understanding of Kafka in high-throughput environments. Experience with observability tools such as Grafana, Prometheus, Kibana, and Elasticsearch. Expertise in Ansible for automation and configuration management. Strong problem-solving skills across infrastructure layers (compute, network, OS, containers). More ❯
Posted:

Senior Linux System Administrator

City of London, London, United Kingdom
NineTech
Administer GitLab infrastructure for CI/CD processes. Operate and maintain Kafka clusters for real-time data pipelines. Diagnose and resolve issues across systems, networks, containers, and applications. Use observability tools (Grafana, Prometheus, Kibana, Elasticsearch) to monitor system health. Automate system management tasks using Ansible. Participate in an on-call rotation to support global operations. Required Skills & Experience: Strong hands … system optimization. Production-level experience managing Kubernetes clusters. Proficiency with GitLab for version control and CI/CD workflows. Solid understanding of Kafka in high-throughput environments. Experience with observability tools such as Grafana, Prometheus, Kibana, and Elasticsearch. Expertise in Ansible for automation and configuration management. Strong problem-solving skills across infrastructure layers (compute, network, OS, containers). More ❯
Posted:

Kubernetes Lead Engineer – HPC Infrastructure

London, England, United Kingdom
Alexander Ash Consulting
robust container orchestration platform to support large-scale, compute-intensive workloads in a high-performance trading environment. You'll lead a team of engineers, define best practices, and ensure observability, scalability, and performance across the platform. Key Responsibilities Lead the design and operation of Kubernetes platforms (on-prem & cloud-native) Manage HPC infrastructure to support trading workloads and scientific compute … Guide a team of engineers across distributed environments Define and enforce best practices for infrastructure scalability, performance, and monitoring Implement observability tooling and ensure high platform availability Collaborate with other engineering teams to drive automation and operational efficiency Requirements 8+ years in infrastructure/platform engineering roles Deep expertise in Kubernetes (both on-premises and cloud-native) Strong Linux (preferably … RHEL) systems administration skills Proven experience with HPC workloads or scientific computing clusters Hands-on experience with observability tools : Prometheus, Grafana, Loki Infrastructure as Code (IaC) using Terraform , Ansible CI/CD experience with GitOps tools (e.g., ArgoCD, Flux) Prior experience leading engineering teams in distributed environments Seniority level Seniority level Director Employment type Employment type Contract Job function Job More ❯
Posted:

Global Head of Technical Account Management (TAM)

London, United Kingdom
Coralogix, inc
success across all regions. Partner closely with R&D, Customer Success, Product, Sales, and Support to drive holistic customer outcomes. Hands-On Technical Expertise Maintain hands-on fluency in observability tooling, logging infrastructure, and cloud environments. Act as a senior technical escalation point for complex deployments or architectural challenges. Provide in-depth technical guidance on customer environments, use cases, and … performance analytics. Collaborate on the development of tools and dashboards to ensure visibility and impact tracking. Requirements Technical Experience 10+ years of technical experience in Cloud DevOps, SaaS, or observability, with 5+ years in leadership roles. Strong hands-on experience with AWS, GCP, Azure, K8S, Terraform and observability tools: Prometheus, Grafana, OpenTelemetry, ELK, Splunk, Datadog, and similar. Proficiency with metrics … team members are encouraged to challenge the status quo and contribute to our shared mission. If you thrive in dynamic environments and are eager to shape the future of observability solutions, we'd love to hear from you. Coralogix is an equal opportunity employer and encourages applicants from all backgrounds to apply. More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Software Engineer

England, United Kingdom
numi
Exciting You’ll work in a Node.js-first environment where product and platform teams collaborate closely. You’ll own core infrastructure and DevOps processes, from CI/CD to observability . You’ll be part of a team that encourages experimentation, autonomy, and continuous improvement . You'll help shape the SRE function at a high-impact stage of growth. … Doing Build and improve CI/CD pipelines (GitHub Actions) that keep development smooth and fast Maintain and scale infrastructure on AWS , including ECS, S3, RDS, and CloudFront Improve observability using tools like Datadog and CloudWatch — and act on what you find Automate key workflows around deployment, testing, scaling, and failure recovery Collaborate with engineers to build scalable, secure, and … For Strong experience working in production Node.js environments Hands-on with AWS services and container orchestration (ECS, Docker) Skilled at building and maintaining CI/CD pipelines Experience with observability, monitoring , and incident management Working knowledge of infrastructure-as-code (Terraform, CloudFormation) A collaborative, proactive mindset with strong communication skills 🎁 What You’ll Get A collaborative, mission-driven culture that More ❯
Posted:

Senior Engineering Manager - Data Platform, 6-Month FTC

London, England, United Kingdom
Hybrid / WFH Options
DEPOP
responsible for Depop's data platform. You'll oversee the teams building our foundational data platform - from experimentation, event ingestion pipelines to our data lake, governance frameworks, and data observability and real-time analytics capabilities. You will work closely with data scientists, machine learning engineers, backend teams, and product leaders to ensure that our data systems are scalable, secure, observable … Data, and Engineering to translate strategic goals into platform capabilities. Lead and mentor 1-2 squads responsible for experimentation, analytics event logging, batch data platform, real-time infrastructure, data observability and governance. Collaborate closely with stakeholders to define and drive the technical roadmap for Depop's modern data platform, enabling reliable and scalable analytics and ML pipelines. Advocate for engineering … excellence, balancing velocity with long-term maintainability, privacy, and performance. Champion data observability and governance, and privacy-by-design principles across the organisation. Hire, mentor, and develop high-performing engineers. Qualifications Proven experience managing data infrastructure or platform teams at scale, ideally in a consumer or marketplace environment. Deep understanding of distributed systems and modern data ecosystems - including experience with More ❯
Posted:

Site Reliability Engineering Manager

London, England, United Kingdom
Hybrid / WFH Options
TECEZE
teams to drive the design, development, and delivery of high-performing, scalable, and reliable infrastructure and services. You’ll be responsible for building robust systems, automating operations, and enhancing observability and deployment pipelines for modern cloud-native applications. Key Responsibilities: System Reliability & Performance: Maintain and scale critical services and infrastructure. Identify performance bottlenecks and work closely with product engineers to … using Terraform and automate deployments across public, private, or hybrid clouds (mainly AWS ). Build and improve robust CI/CD pipelines to support fast and safe deployment cycles. Observability & Monitoring: Implement code-based instrumentation and telemetry. Ensure systems are observable with tools for logging, metrics, and alerting. Automation & Scripting: Write tooling and automation scripts in Python , Go , or Rust … Deep understanding of Linux internals, standard networking protocols, and distributed systems architecture. Hands-on experience with automation and performance optimisation. Strong knowledge of SRE principles and methodologies. Experience with observability tools and telemetry systems. Exposure to Google Cloud Platform (GCP). Familiarity with hybrid or multi-cloud architecture. Experience with service meshes or edge proxies (e.g., Envoy, Istio). Working More ❯
Posted:

Global Head of Technical Account Management (TAM)

London, England, United Kingdom
Coralogix
success across all regions. Partner closely with R&D, Customer Success, Product, Sales, and Support to drive holistic customer outcomes. Hands-On Technical Expertise Maintain hands-on fluency in observability tooling, logging infrastructure, and cloud environments. Act as a senior technical escalation point for complex deployments or architectural challenges. Provide in-depth technical guidance on customer environments, use cases, and … performance analytics. Collaborate on the development of tools and dashboards to ensure visibility and impact tracking. Requirements Technical Experience 10+ years of technical experience in Cloud DevOps, SaaS, or observability, with 5+ years in leadership roles. Strong hands-on experience with AWS, GCP, Azure, K8S, Terraform and observability tools: Prometheus, Grafana, OpenTelemetry, ELK, Splunk, Datadog, and similar. Proficiency with metrics … team members are encouraged to challenge the status quo and contribute to our shared mission. If you thrive in dynamic environments and are eager to shape the future of observability solutions, we’d love to hear from you. Coralogix is an equal opportunity employer and encourages applicants from all backgrounds to apply. Seniority level Seniority level Executive Employment type Employment More ❯
Posted:
Observability
10th Percentile
£57,500
25th Percentile
£65,000
Median
£80,000
75th Percentile
£94,688
90th Percentile
£117,500