Observability Job Vacancies

1 to 25 of 2,196 Observability Jobs

DevSecOps Engineer

London, England, United Kingdom
Hybrid / WFH Options
Tes
microservices design patterns and deployment strategies in a cloud-native environment. Security Best Practices: Strong understanding of security frameworks and compliance standards for cloud infrastructure and DevOps processes. Monitoring & Observability: Understanding of monitoring, logging, and alerting tools (e.g., Prometheus, Grafana, ELK) to ensure system performance and issue tracking. Skills CI/CD Tools: Hands-on experience with Jenkins, GitLab CI More ❯
Posted:

Senior DevOps Engineer - Monitoring & Observability

London, England, United Kingdom
Lumenalta
Join or sign in to find your next job Join to apply for the Senior DevOps Engineer - Monitoring & Observability role at Lumenalta As a Senior DevOps Engineer at Lumenalta, you will be pivotal in architecting and managing cloud-based systems on AWS, implementing CI/CD pipelines, and automating infrastructure deployment using tools like Terraform and AWS CDK. You will … to automate application builds, testing, and deployments. Infrastructure as Code (IaC): Use Terraform, AWS CDK, or CloudFormation to automate cloud resource provisioning, enabling consistent and repeatable infrastructure deployments. Monitoring & Observability: Implement monitoring, logging, and alerting solutions using tools like Prometheus, Grafana, Loki, Datadog, or CloudWatch to ensure system health and performance. Security & Compliance: Implement security best practices for cloud infrastructure More ❯
Posted:

Mid-Senior DevOps / Site Reliability Engineer (m/f/*)

London, England, United Kingdom
Hybrid / WFH Options
Quaisr Limited
such as Kubernetes, Docker Swarm, or HashiCorp Nomad. Excellent problem-solving, communication, and collaboration skills. Nice to have: Experience managing distributed systems, microservices, and event-driven architectures. Knowledge of observability tools such as Prometheus, Grafana, ELK Stack, or Datadog. Experience with security best practices, monitoring, and incident response. Familiarity with DevSecOps and compliance frameworks (ISO 27001, SOC 2, GDPR). More ❯
Posted:

DevOps/Site Reliability Engineer, Junior/Mid/Senior (m/f/ )

United Kingdom
Hybrid / WFH Options
Crane Venture Partners
such as Kubernetes, Docker Swarm, or HashiCorp Nomad. Excellent problem-solving, communication, and collaboration skills. Nice to have: Experience managing distributed systems, microservices, and event-driven architectures. Knowledge of observability tools such as Prometheus, Grafana, ELK Stack, or Datadog. Experience with security best practices, monitoring, and incident response. Familiarity with DevSecOps and compliance frameworks (ISO 27001, SOC 2, GDPR). More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Senior DevOps Engineer

London, England, United Kingdom
Darktrace
ArgoCD and Helm). Experience in migrating monolithic applications into microservices architectures. In-depth Linux/Unix experience, emphasizing system performance tuning and automation. Familiarity with monitoring, logging, and observability tools (e.g., Prometheus, Grafana, Loki, OTel, ELK stack) to ensure system reliability and performance. Experience in developing and working with backend applications technologies (e.g. Express, Django). Benefits we offer More ❯
Posted:

Senior DevOps Engineer

Liverpool, Lancashire, United Kingdom
Hybrid / WFH Options
The Acorn Group
with GitOps tools (e.g., ArgoCD, Flux). CI/CD - Skilled in building and managing pipelines using Azure DevOps, GitHub Actions, etc. Monitoring - Experience with Prometheus, Grafana, and other observability tools. Application Stack - Familiarity with .NET, Node.js, React, and web server technologies like Nginx. Relevant certifications or the ability to demonstrate equivalent experience, such as: Terraform Associate About Acorn Insurance More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Monitoring & Observability Engineer

South East London, London, United Kingdom
COMPUTACENTER (UK) LIMITED
GPS). Our teams operate across the UK, Germany, France, and India, delivering complex, enterprise-grade IT solutions and consultancy across infrastructure, cloud, and modern operations. As a Monitoring & Observability Engineer, you'll work in high-impact delivery teams that support some of the worlds most well-known organisations. Youll play a key role in helping our customers achieve greater … visibility, performance, and reliability across their IT estatescontributing to their operational success through proactive insight and incident prevention. What you'll do Design, implement, and manage observability solutions using industry-leading tools such as Dynatrace (primary), Grafana, and Splunk Collect and analyse telemetry data (metrics, logs, traces, events) to diagnose and resolve system and application performance issues Integrate monitoring platforms … with ITSM tools (e.g. ServiceNow) and CI/CD pipelines to enable proactive alerting and resolution workflows Act as a Monitoring & Observability SME within customer delivery teams Support incident response activities and postmortems by identifying patterns, root causes, and optimisation opportunities Work collaboratively with cross-functional teams to define and implement best practices in observability and monitoring Attend customer and More ❯
Employment Type: Permanent
Posted:

Monitoring & Observability Engineer

London, United Kingdom
Computacenter AG & Co. oHG
Select how often (in days) to receive an alert: Monitoring & Observability Engineer Life on the team At Computacenter, you'll be joining a world-class team of over 1,000 skilled professionals within Group Professional Services (GPS). Our teams operate across the UK, Germany, France, and India, delivering complex, enterprise-grade IT solutions and consultancy across infrastructure, cloud, and … modern operations. As a Monitoring & Observability Engineer, you'll work in high-impact delivery teams that support some of the world's most well-known organisations. You'll play a key role in helping our customers achieve greater visibility, performance, and reliability across their IT estates-contributing to their operational success through proactive insight and incident prevention. What you'll … do Design, implement, and manage observability solutions using industry-leading tools such as Dynatrace (primary), Grafana, and Splunk Collect and analyse telemetry data (metrics, logs, traces, events) to diagnose and resolve system and application performance issues Integrate monitoring platforms with ITSM tools (e.g. ServiceNow) and CI/CD pipelines to enable proactive alerting and resolution workflows Act as a Monitoring More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Monitoring & Observability Engineer

Lakenheath, Suffolk, United Kingdom
Computacenter AG & Co. oHG
Select how often (in days) to receive an alert: Monitoring & Observability Engineer Life on the team At Computacenter, you'll be joining a world-class team of over 1,000 skilled professionals within Group Professional Services (GPS). Our teams operate across the UK, Germany, France, and India, delivering complex, enterprise-grade IT solutions and consultancy across infrastructure, cloud, and … modern operations. As a Monitoring & Observability Engineer, you'll work in high-impact delivery teams that support some of the world's most well-known organisations. You'll play a key role in helping our customers achieve greater visibility, performance, and reliability across their IT estates-contributing to their operational success through proactive insight and incident prevention. What you'll … do Design, implement, and manage observability solutions using industry-leading tools such as Dynatrace (primary), Grafana, and Splunk Collect and analyse telemetry data (metrics, logs, traces, events) to diagnose and resolve system and application performance issues Integrate monitoring platforms with ITSM tools (e.g. ServiceNow) and CI/CD pipelines to enable proactive alerting and resolution workflows Act as a Monitoring More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Delivery Engineer

United Kingdom
Hybrid / WFH Options
Sportserve
Python (or other language), Bash/Shell, YAML including any Development frameworks Extensive experience and in-depth knowledge of the Linux operating system for effective troubleshooting activities Experience with Observability tools like Grafana, Prometheus, ELK, OCI Observability We highly value ownership and initiative with capabilities to drive projects independently Dealing with changes on a daily basis in a very dynamic More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Principal DevOps Engineer - AWS

London, England, United Kingdom
NTT DATA
to architect secure, performant, and highly available cloud solutions. Proficiency with monitoring and log analytics tools such as AWS CloudWatch, ELK Stack, Prometheus, Datadog, or New Relic, to maintain observability and ensure operational excellence. Demonstrated leadership skills in managing complex, high-pressure situations and guiding teams through incident resolution. Exceptional communication and presentation skills, with proven experience engaging with senior … to architect secure, performant, and highly available cloud solutions. Proficiency with monitoring and log analytics tools such as AWS CloudWatch, ELK Stack, Prometheus, Datadog, or New Relic, to maintain observability and ensure operational excellence. Demonstrated leadership skills in managing complex, high-pressure situations and guiding teams through incident resolution. Exceptional communication and presentation skills, with proven experience engaging with senior More ❯
Posted:

GCP Technical Lead

London Area, United Kingdom
TEKsystems
Storage, Compute G Cloud CLI, VPC, IAM, GCE, GCS, GKE, Pub Sub, Cloud Run, Cloud SQL, Big Query, Dataflow, Bigtable, Fire store GCP – Networking, Security tool/Best Practices Observability - Operations suite, Logging, Monitoring, Alerting. Additional Skills: Good understanding of Linux OS. Bash, Scripting, Automation, Ansible, Networking, Security. Hands-on experience with DevOps Principles and Tools. Hands-on with Terraform More ❯
Posted:

GCP Technical Lead

City of London, London, United Kingdom
TEKsystems
Storage, Compute G Cloud CLI, VPC, IAM, GCE, GCS, GKE, Pub Sub, Cloud Run, Cloud SQL, Big Query, Dataflow, Bigtable, Fire store GCP – Networking, Security tool/Best Practices Observability - Operations suite, Logging, Monitoring, Alerting. Additional Skills: Good understanding of Linux OS. Bash, Scripting, Automation, Ansible, Networking, Security. Hands-on experience with DevOps Principles and Tools. Hands-on with Terraform More ❯
Posted:

Senior Site Reliability Engineer

London, United Kingdom
Hybrid / WFH Options
Stratospherec Ltd
one or more public cloud providers such as Azure, AWS or GCP Proficiency using Infrastructure as Code (IaC) tools such as Terraform (preferred), Ansible, or CloudFormation. Experience with monitoring, observability and logging tools such as DataDog, Prometheus, Grafana, or similar. Proven track record of maintaining highly-available and performant production environments. Ability to identify and implement effective mitigation strategies and More ❯
Employment Type: Permanent
Salary: £85000 - £90000/annum Excellent Benefits package
Posted:

Senior DevOps Engineer, Clinical Software

United Kingdom
Waters Corporation
to maintain a CI build environment capable of running automation tests for effective feedback. Assist in designing, developing and implementing automation test frameworks. Develop and improve our monitoring and observability tooling. Coach and mentorteam matesto improve their own DevOps skills and experience Research emerging tools, trends and methodologies Assist in managing checked in source code from check-in through to More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Site Reliability Engineer

London, England, United Kingdom
Hybrid / WFH Options
Global Screening Services
impact! About The Role This is an exciting opportunity to join our growing Operations team managing Kubernetes clusters in Production and, through a DevOps culture, empower development teams with observability insights they can use to innovate faster. We are looking for a Site Reliability Engineer, or production experienced DevOps Engineer, who has working experience building observability for cloud native SaaS … products and driving operational excellence. You will be responsible for delivering our monitoring infrastructure, shaping observability, and responding to incidents as well as ensuring the platform is performant and reliable. You will be a key member of the team, liaising with product teams, embedding SRE principles and building the observability platform for the next stage of growth at GSS. You … new features are maintainable, have well defined SLIs, achievable SLOs, are properly monitored, and evaluated for failure scenarios Enabling development teams through DevOps culture and the effective use of observability tools. Promote best practice, present KT sessions, help troubleshoot and resolve business affecting issues Building on our existing monitoring tools to deliver a comprehensive, optimised observability platform for logging, metrics More ❯
Posted:

Site Reliability Engineer

Southampton, Hampshire, United Kingdom
Hybrid / WFH Options
Spectrum IT Recruitment
level goals You'll Stand Out If You Have: Practical experience managing large-scale Kubernetes clusters; certifications in Kubernetes are a strong bonus Hands-on familiarity with the Grafana Observability Suite, including tools like Loki, Mimir, and Tempo Background in administering or developing with popular monitoring and automation tools such as Splunk, Datadog, PagerDuty, or Rundeck Experience using configuration management … principles and hands-on experience with tools such as Jenkins, GitLab CI/CD, or CircleCI Strong understanding of containerisation (e.g., Docker, Kubernetes) and microservices architecture Skilled in using observability and monitoring tools such as Prometheus, Grafana, ELK stack, or AWS CloudWatch Excellent analytical and troubleshooting abilities, especially within complex distributed systems Proven experience handling incident management and conducting blameless More ❯
Employment Type: Permanent
Posted:

Site Reliability Engineer

Hampshire, England, United Kingdom
Hybrid / WFH Options
Spectrum IT Recruitment
level goals You'll Stand Out If You Have: Practical experience managing large-scale Kubernetes clusters; certifications in Kubernetes are a strong bonus Hands-on familiarity with the Grafana Observability Suite, including tools like Loki, Mimir, and Tempo Background in administering or developing with popular monitoring and automation tools such as Splunk, Datadog, PagerDuty, or Rundeck Experience using configuration management … principles and hands-on experience with tools such as Jenkins, GitLab CI/CD, or CircleCI Strong understanding of containerisation (e.g., Docker, Kubernetes) and microservices architecture Skilled in using observability and monitoring tools such as Prometheus, Grafana, ELK stack, or AWS CloudWatch Excellent analytical and troubleshooting abilities, especially within complex distributed systems Proven experience handling incident management and conducting blameless More ❯
Posted:

SysOps Engineer

Eastbourne, England, United Kingdom
Hybrid / WFH Options
AxisOps
and architecture through to production and operations. Our strength lies in software delivery, supported by deep expertise in platform engineering, built on an understanding of private cloud-native infrastructure, observability, and DevSecOps. Our culture We value sharp thinking, clear communication, and teams that look out for each other. At AxisOps, our core values are: Ingenuity – solving hard problems with elegant … runtimes is welcome but not required) Maintain and evolve microservice architecture built in Python and PHP, with deployment via GitLab CI/CD and runtime orchestration via Andromeda Deliver observability using Prometheus, Grafana, and the ELK stack, supporting metrics, logs, and alerting workflows Support and maintain internal ML infrastructure and pipelines , helping ensure that our AI and data workloads run … maintain standardised developer desktop environments , supporting our engineering team’s daily tooling and dev workflow Contribute to our IoT platform , including reliable edge infrastructure, secure messaging, and data flow observability Support and maintain our private datacentre , including rack-level hardware, networking, and server fleet resilience Continuously improve security posture , covering patching, firewall maintenance, secrets handling, and backup strategy Write markdown More ❯
Posted:

Site Reliability Engineer

Hedge End, England, United Kingdom
Hybrid / WFH Options
Spectrum IT Recruitment
level goals You'll Stand Out If You Have: Practical experience managing large-scale Kubernetes clusters; certifications in Kubernetes are a strong bonus Hands-on familiarity with the Grafana Observability Suite, including tools like Loki, Mimir, and Tempo Background in administering or developing with popular monitoring and automation tools such as Splunk, Datadog, PagerDuty, or Rundeck Experience using configuration management … principles and hands-on experience with tools such as Jenkins, GitLab CI/CD, or CircleCI Strong understanding of containerisation (e.g., Docker, Kubernetes) and microservices architecture Skilled in using observability and monitoring tools such as Prometheus, Grafana, ELK stack, or AWS CloudWatch Excellent analytical and troubleshooting abilities, especially within complex distributed systems Proven experience handling incident management and conducting blameless More ❯
Posted:

Site Reliability Engineer

London, England, United Kingdom
Hybrid / WFH Options
ZipRecruiter
level goals You'll Stand Out If You Have: Practical experience managing large-scale Kubernetes clusters; certifications in Kubernetes are a strong bonus Hands-on familiarity with the Grafana Observability Suite, including tools like Loki, Mimir, and Tempo Background in administering or developing with popular monitoring and automation tools such as Splunk, Datadog, PagerDuty, or Rundeck Experience using configuration management … principles and hands-on experience with tools such as Jenkins, GitLab CI/CD, or CircleCI Strong understanding of containerisation (e.g., Docker, Kubernetes) and microservices architecture Skilled in using observability and monitoring tools such as Prometheus, Grafana, ELK stack, or AWS CloudWatch Excellent analytical and troubleshooting abilities, especially within complex distributed systems Proven experience handling incident management and conducting blameless More ❯
Posted:

Senior DevOps Engineer (Trading Platform) | London, UK

London, England, United Kingdom
Crypto.com
components such as market data feeds, order gateways, execution algorithms, risk engines, UI dashboards, middle office reconciliation, and account infrastructure. We emphasize event-driven, deterministic system design, real-time observability, and strong security. Our tech stack includes Java (low-latency), Python, Web UI (React/Ag-Grid), Aeron, ClickHouse, Kubernetes, and modern CI/CD tooling, with a strong focus … development tools are also leveraged to boost productivity and quality across the team. Responsibilities Design, provision, and maintain scalable infrastructure for our trading systems, including CI/CD pipelines, observability stack, and runtime environments. Administer and tune databases such as AWS Aurora, PostgreSQL, and ClickHouse. Automate provisioning and configuration of EC2 and related resources using Ansible and other infrastructure-as More ❯
Posted:

DevOps Engineer

London, England, United Kingdom
Hybrid / WFH Options
Canada Life
infrastructure to the cloud and understanding the challenges involved Familiarity with cloud security best practices, identity and access management (IAM), and encryption techniques Microsoft Azure certifications are a plus Observability Designing, implementing and day-to-day use of logging and monitoring tools to capture data for alerting and issue identification and resolution using DataDog, App Insights or similar tools. Designing … applications and infrastructure for observability, security, and reliability. Networking & Security Monitor and enhance network performance, ensuring high levels of security and scalability across all cloud environments. Enforce security best practices in AKS, including network policies, RBAC (Role-Based Access Control), and integration with Azure Active Directory Core Services Software development experience, ideally in .NET stack. SQL skills to manage and More ❯
Posted:

DV Cleared Platform Engineer

Corsham, Wiltshire, South West, United Kingdom
Global Technology Solutions Ltd
the provisioning and management of systems using Infrastructure as Code (IaC) Support containerisation and orchestration technologies such as Docker and Kubernetes Monitor platform performance, availability, and security using modern observability tools Collaborate with DevOps, security, and application teams to ensure seamless and secure delivery pipelines Implement and maintain CI/CD pipelines and deployment automation Manage secure configurations, patching, and More ❯
Employment Type: Contract
Posted:

DV Cleared Platform Engineer

swindon, wiltshire, south west england, united kingdom
Global Technology Solutions Ltd
the provisioning and management of systems using Infrastructure as Code (IaC) Support containerisation and orchestration technologies such as Docker and Kubernetes Monitor platform performance, availability, and security using modern observability tools Collaborate with DevOps, security, and application teams to ensure seamless and secure delivery pipelines Implement and maintain CI/CD pipelines and deployment automation Manage secure configurations, patching, and More ❯
Posted:
Observability
10th Percentile
£57,500
25th Percentile
£65,000
Median
£80,000
75th Percentile
£97,500
90th Percentile
£120,000