Observability Jobs in the UK

176 to 200 of 2,758 Observability Jobs in the UK

Senior MLOps/GenAI Infrastructure Engineer

Salford, England, United Kingdom
Hybrid / WFH Options
BBC Group and Public Services
as-Code with AWS CDK, CloudFormation to provision and manage cloud environments. Build and maintain CI/CD pipelines using GitHub Actions, AWS CodePipeline, CodeBuild, Jenkins. Integrate monitoring and observability tools such as AWS CloudWatch, Prometheus, Grafana for infrastructure and model health tracking. Ensure software quality through Test-Driven Development (TDD), unit testing frameworks (e.g., pytest, unittest), and automated integration More ❯
Posted:

Senior MLOps/GenAI Infrastructure Engineer

Newcastle upon Tyne, England, United Kingdom
Hybrid / WFH Options
BBC Group and Public Services
as-Code with AWS CDK, CloudFormation to provision and manage cloud environments. Build and maintain CI/CD pipelines using GitHub Actions, AWS CodePipeline, CodeBuild, Jenkins. Integrate monitoring and observability tools such as AWS CloudWatch, Prometheus, Grafana for infrastructure and model health tracking. Ensure software quality through Test-Driven Development (TDD), unit testing frameworks (e.g., pytest, unittest), and automated integration More ❯
Posted:

Lead Devops

London, England, United Kingdom
Tata Consultancy Services
teams to build secure, scalable, and cost-efficient cloud solutions. You will be provided with access to cutting-edge cloud technologies, including AWS serverless computing, Kubernetes orchestration, AI-driven observability, and security automation, keeping you at the forefront of innovation. Your responsibilities: Implement and manage highly available, scalable, and secure applications hosted on AWS Cloud, leveraging multi-region deployment strategies More ❯
Posted:

Software Engineer - Observability (Remote Scotland)

Dundee, Angus, United Kingdom
Hybrid / WFH Options
Ivanti
user experience. This department plays a pivotal role in shaping the company's growth trajectory through continuous innovation and customer-centric solutions. What You Will Be Doing Assist in Observability Implementation: Support the development and maintenance of monitoring, logging, and tracing solutions. Monitor & Manage Observability Tools: Help deploy and manage observability platforms such as Azure Application Insights (AppInsights), New Relic … Resolution) and reduce false positives. Ensure Cloud & Infrastructure Visibility: Contribute to scalable monitoring solutions for AWS and Azure environments. Collaborate with DevOps & SRE Teams: Work with teams to integrate observability best practices into CI/CD pipelines. Documentation & Knowledge Sharing: Contribute to runbooks, dashboards, and best practice guides to support observability initiatives. To Be Successful in The Role, You Will … Have Required Qualifications: 3-5 years of experience in observability, monitoring, or DevOps-related roles. Basic experience with monitoring tools such as Azure AppInsights, New Relic, Prometheus, and Grafana. Understanding of OpenTelemetry, New Relic, AppInsights APM for telemetry data collection. Familiarity with AWS and Azure cloud environments. Exposure to Kubernetes and container monitoring. Basic scripting knowledge (Python, Go, Bash, or More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Senior Platform Engineer, Observability

London, England, United Kingdom
Forter
Join to apply for the Senior Platform Engineer, Observability role at Forter 1 week ago Be among the first 25 applicants Join to apply for the Senior Platform Engineer, Observability role at Forter Get AI-powered advice on this job and more exclusive features. At Forter, you’ll have the chance to make a direct impact on the developer experience … across the company while working with cutting-edge observability technologies and practices. We value innovation , collaboration , and continuous improvement , and we can’t wait to see what you’ll bring to our team! About the role: At Forter, we are looking for a Senior Platform Engineer, Observability with a strong development background and hands-on experience in observability tooling (ELK … we observe and troubleshoot our systems. We efficiently handle TBs of o11y data per day with very few incidents. In this role, you will help shape the future of observability at Forter, building scalable monitoring systems, creating intuitive developer tools, and collaborating with cross-functional teams to ensure our systems are both highly reliable and easy to troubleshoot. We’re More ❯
Posted:

Technical Account Manager

Slough, England, United Kingdom
JR United Kingdom
/join with: ? London, UK | ? Full-time | ? Senior-Level I'm hiring for a Technical Account Manager on behalf of a high-growth SaaS company building a next-generation observability platform. Their technology helps engineering teams monitor, analyse, and act on their logs, metrics, traces, and security data — improving performance and cutting observability spend. This is a senior, customer-facing … technical role ideal for someone with a background in cloud infrastructure, observability tools, and DevOps. You’ll play a key role in onboarding, supporting, and expanding relationships with enterprise customers — from hands-on implementation to strategic advisory. ? What You’ll Be Doing: Own the technical onboarding journey for new customers — from data integration to configuration and enablement. Work closely with … DevOps, SREs, and engineering teams to understand requirements and deliver high-impact observability solutions. Troubleshoot complex infrastructure issues (Kubernetes, Docker, pipelines, etc.) and advise on best practices. Act as a trusted technical advisor , providing guidance on implementation, optimisation, and long-term success. Partner with sales and customer success teams on renewals, expansions, and QBRs. What You Bring: Strong hands-on More ❯
Posted:

Technical Account Manager

London, United Kingdom
Coralogix, inc
us on our journey to revolutionize observability. In 2023, Dun & Bradstreet ranked Coralogix as one of the best tech startups to work for. Coralogix is a modern, full-stack observability platform transforming how businesses process and understand their data. Our unique architecture powers in-stream analytics without reliance on expensive indexing or hot storage. We specialize in comprehensive monitoring of … logs, metrics, trace and security events with features such as APM, RUM, SIEM, Kubernetes monitoring and more, all enhancing operational efficiency and reducing observability spend by up to 70%. Technical Account Managers in Coralogix are key in our effort to meet our customer's expectations and help them utilize their observability and security data in the most efficient way … looking for hard-working, sharp, and humble professionals with proven technical customer-facing experience. Our Technical Account Managers are trusted advisors and consult our customers upon their monitoring, security & observability journey. This role embodies the critical intersection of very high technical expertise and a focus on customer satisfaction, renewal and expansion. Technical Account Managers are senior-level roles and are More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

SR Site Reliability Engineer

Mablethorpe, England, United Kingdom
Wakapi
systems using load balancing, auto-scaling, canary releases, and blue-green deployments. Develop and maintain monitoring and logging dashboards with tools like New Relic, Prometheus, Grafana, and Datadog, ensuring observability through metrics, tracing, log aggregation, and alerting. Help teams determine settings and thresholds for alerts and automations based on application performance requirements. Monitor, optimize, and ensure system reliability and performance … like Terraform. Strong understanding of scalability, high availability patterns, and DevOps metrics such as DORA. Knowledge of SLM metrics (SLAs, SLOs, SLIs) and their application. Experience with monitoring and observability tools like New Relic, Prometheus, Grafana, and Datadog. Experience working with Kafka and improving performance in event-driven, real-time data architectures. Familiarity with cloud providers like AWS, Azure, or … GCP. Experience with CI/CD tools such as GitHub Actions, Jenkins, or GitLab CI. Strong analytical and communication skills. Nice-to-haves Familiarity with Observability-as-Code tooling and practices. Knowledge of Chaos Engineering practices. Senior Level: Mid-Senior, Employment: Full-time, Industry: Software Development #J-18808-Ljbffr More ❯
Posted:

Technical Account Manager - DevOps Specialist

City of London, London, United Kingdom
ITR Partners
Technical Account Manager - DevOps Specialist London - Hybrid (2 days per week in office) · Full-time · Senior About the company My client are rebuilding the path to observability using a real-time streaming analytics pipeline that provides monitoring, visualization, and alerting capabilities without the burden of indexing. By enabling users to define different data pipelines per use case, we provide deep … Observability and Security insights, at an infinite scale, for less than half the cost. About the Position Technical Account Managers in my client are key in our effort to meet our customer’s expectations and help them utilize their observability and security data in the most efficient way possible. We are looking for hard-working, sharp, and humble professionals with … proven technical customer-facing experience. Their Technical Account Managers are trusted advisors and consult their customers upon their monitoring, security & observability journey. This role embodies the critical intersection of very high technical expertise and a focus on customer satisfaction, renewal and expansion. Technical Account Managers are senior-level roles and are expected to professionally and accurately solve problems, show product More ❯
Posted:

Technical Account Manager - DevOps Specialist

London Area, United Kingdom
ITR Partners
Technical Account Manager - DevOps Specialist London - Hybrid (2 days per week in office) · Full-time · Senior About the company My client are rebuilding the path to observability using a real-time streaming analytics pipeline that provides monitoring, visualization, and alerting capabilities without the burden of indexing. By enabling users to define different data pipelines per use case, we provide deep … Observability and Security insights, at an infinite scale, for less than half the cost. About the Position Technical Account Managers in my client are key in our effort to meet our customer’s expectations and help them utilize their observability and security data in the most efficient way possible. We are looking for hard-working, sharp, and humble professionals with … proven technical customer-facing experience. Their Technical Account Managers are trusted advisors and consult their customers upon their monitoring, security & observability journey. This role embodies the critical intersection of very high technical expertise and a focus on customer satisfaction, renewal and expansion. Technical Account Managers are senior-level roles and are expected to professionally and accurately solve problems, show product More ❯
Posted:

Technical Account Manager - DevOps Specialist

Slough, England, United Kingdom
JR United Kingdom
wide Job Description: Technical Account Manager - DevOps Specialist London - Hybrid (2 days per week in office) · Full-time · Senior About the company My client are rebuilding the path to observability using a real-time streaming analytics pipeline that provides monitoring, visualization, and alerting capabilities without the burden of indexing. By enabling users to define different data pipelines per use case … we provide deep Observability and Security insights, at an infinite scale, for less than half the cost. About the Position Technical Account Managers in my client are key in our effort to meet our customer’s expectations and help them utilize their observability and security data in the most efficient way possible. We are looking for hard-working, sharp, and … humble professionals with proven technical customer-facing experience. Their Technical Account Managers are trusted advisors and consult their customers upon their monitoring, security & observability journey. This role embodies the critical intersection of very high technical expertise and a focus on customer satisfaction, renewal and expansion. Technical Account Managers are senior-level roles and are expected to professionally and accurately solve More ❯
Posted:

DevOps Engineer - AWS

City of London, London, United Kingdom
Hybrid / WFH Options
Cognitive Group | Part of the Focus Cloud Group
and service incidents with root cause analysis and preventive measures. Handle change requests, track recurring issues, and work on long-term fixes to improve system stability. Implement and maintain observability solutions using Prometheus, Grafana, and Splunk. Write PromQL queries for custom monitoring dashboards, alerting, and diagnostics. Manage and optimize CI/CD pipelines for automated testing, deployment, and rollback strategies. … AWS services at the DevOps Engineer level Incident, change & problem management experience. This role is heavily operation-oriented, including on-call requirements Strong background in setup & operation of enterprise observability tooling, specifically Prometheus, Grafana and Splunk, including usage of PromQL Proficient in one or more languages of Python, Go, Bash, SQL Familiar with GitHub/GitOps/container orchestration/ More ❯
Posted:

DevOps Engineer - AWS

London Area, United Kingdom
Hybrid / WFH Options
Cognitive Group | Part of the Focus Cloud Group
and service incidents with root cause analysis and preventive measures. Handle change requests, track recurring issues, and work on long-term fixes to improve system stability. Implement and maintain observability solutions using Prometheus, Grafana, and Splunk. Write PromQL queries for custom monitoring dashboards, alerting, and diagnostics. Manage and optimize CI/CD pipelines for automated testing, deployment, and rollback strategies. … AWS services at the DevOps Engineer level Incident, change & problem management experience. This role is heavily operation-oriented, including on-call requirements Strong background in setup & operation of enterprise observability tooling, specifically Prometheus, Grafana and Splunk, including usage of PromQL Proficient in one or more languages of Python, Go, Bash, SQL Familiar with GitHub/GitOps/container orchestration/ More ❯
Posted:

DevOps Engineer - AWS

South East London, England, United Kingdom
Hybrid / WFH Options
Cognitive Group | Part of the Focus Cloud Group
and service incidents with root cause analysis and preventive measures. Handle change requests, track recurring issues, and work on long-term fixes to improve system stability. Implement and maintain observability solutions using Prometheus, Grafana, and Splunk. Write PromQL queries for custom monitoring dashboards, alerting, and diagnostics. Manage and optimize CI/CD pipelines for automated testing, deployment, and rollback strategies. … AWS services at the DevOps Engineer level Incident, change & problem management experience. This role is heavily operation-oriented, including on-call requirements Strong background in setup & operation of enterprise observability tooling, specifically Prometheus, Grafana and Splunk, including usage of PromQL Proficient in one or more languages of Python, Go, Bash, SQL Familiar with GitHub/GitOps/container orchestration/ More ❯
Posted:

Platform Engineer

United Kingdom
Hybrid / WFH Options
CATCHES
and CI/CD capabilities RESPONSIBILITIES Orchestrate and maintain our Baremetal and GCP infrastructure. Implement infrastructure-as-code (Terraform) and automated release workflows that enable true continuous delivery. Drive observability: log aggregation, metrics, distributed tracing and on-call runbooks. Champion security, cost-efficiency and performance tuning across our services. Collaborate with product and platform teams to ship end-to-end … features and migrations. REQUIREMENTS Extensive experience orchestrating infrastructure at scale across cloud and baremetal. SRE & Kubernetes expertise (GKE/AKS/EKS) and container-native observability stacks (Datadog/Prometheus/Grafana). Proven ownership of CI/CD pipelines (GitHub Actions, Cloud Build, Azure DevOps, etc.) and release automation. Proven experience with multiplatform scripting languages (Python, bash, PowerShell). … TECH STACK Cloud: GCP (primary), Azure (minimal) Languages: Terraform, Python, Bash, Powershell Databases: PostgreSQL, Redis, BigQuery Messaging: Pub/Sub, RabbitMQ Infra & Ops: Docker, Kubernetes, Terraform, GitHub Actions, Proxmox Observability: OpenTelemetry, Datadog More ❯
Posted:

Platform Engineer

London, England, United Kingdom
Hybrid / WFH Options
CATCHES
and CI/CD capabilities RESPONSIBILITIES Orchestrate and maintain our Baremetal and GCP infrastructure. Implement infrastructure-as-code (Terraform) and automated release workflows that enable true continuous delivery. Drive observability: log aggregation, metrics, distributed tracing and on-call runbooks. Champion security, cost-efficiency and performance tuning across our services. Collaborate with product and platform teams to ship end-to-end … features and migrations. REQUIREMENTS Extensive experience orchestrating infrastructure at scale across cloud and baremetal. SRE & Kubernetes expertise (GKE/AKS/EKS) and container-native observability stacks (Datadog/Prometheus/Grafana). Proven ownership of CI/CD pipelines (GitHub Actions, Cloud Build, Azure DevOps, etc.) and release automation. Proven experience with multiplatform scripting languages (Python, bash, PowerShell). … TECH STACK Cloud: GCP (primary), Azure (minimal) Languages: Terraform, Python, Bash, Powershell Databases: PostgreSQL, Redis, BigQuery Messaging: Pub/Sub, RabbitMQ Infra & Ops: Docker, Kubernetes, Terraform, GitHub Actions, Proxmox Observability: OpenTelemetry, Datadog More ❯
Posted:

AI ML Lead Site Reliability Engineer

Glasgow, Scotland, United Kingdom
JPMorgan Chase & Co
during major incidents, quickly identifying and resolving issues to prevent financial losses. Partner with product engineering teams to ensure AI/ML systems are reliable and high-performing. Develop observability, security, automation, and fin-ops tools and orchestration solutions. Provide strategic technology leadership by defining standards and architectures for reliability and automation frameworks. Build strong cross-functional relationships to deliver … least one programming language such as Python, Java Spring Boot, or .Net. Deep knowledge of software applications and technical processes, with emerging expertise in specific technical disciplines. Experience with observability tools like Grafana, Dynatrace, Prometheus, Datadog, Splunk, including monitoring, SLO alerting, and telemetry collection. Proficiency with CI/CD tools such as Jenkins, GitLab, Terraform. Experience with containerization and orchestration … drive. Preferred qualifications, capabilities, and skills Experience in AI, ML, or Data engineering. Expertise in Kubernetes and container orchestration. Experience developing automation frameworks or AI Ops solutions. Experience building observability and telemetry tools. About Us J.P. Morgan is a global leader in financial services, providing strategic advice and products to prominent clients worldwide. We value diversity and inclusion, and are More ❯
Posted:

Site Reliability Engineer

City of London, England, United Kingdom
Whitehall Resources Ltd
in managing cloud infrastructure, ensuring the reliability of production systems, and improving end-to-end deployment pipelines. This role combines deep operational responsibilities with a strong focus on automation, observability, and continuous improvement. You will be responsible for maintaining high system availability, enabling rapid delivery through CI/CD, and supporting development teams with robust infrastructure and tooling. A key … incidents with root cause analysis and preventive measures. 3. Handle change requests, track recurring issues, and work on long-term fixes to improve system stability. 4. Implement and maintain observability solutions using Prometheus, Grafana, and Splunk. 5. Write PromQL queries for custom monitoring dashboards, alerting, and diagnostics. 6. Manage and optimize CI/CD pipelines for automated testing, deployment, and … at the DevOps Engineer level 2. Incident, change & problem management experience. This role is heavily operation-oriented, including on-call requirements 3. Strong background in setup & operation of enterprise observability tooling, specifically Prometheus, Grafana and Splunk, including usage of PromQL 4. Proficient in one or more languages of Python, Go, Bash, SQL 5. Familiar with GitHub/GitOps/container More ❯
Posted:

Site Reliability Engineer

Reigate, England, United Kingdom
Hybrid / WFH Options
Willis Towers Watson
Engineer to join our SRE team based in Reigate. The ideal candidate will have excellent communication skills, experience working with multiple stakeholders, and a track record in Azure and Observability platforms. You will be joining Insurance Consulting and Technology (ICT) at an exciting time of transformation as we work on improving the delivery of value for customers and the business. … office up to two days per week. The Role: Collaborate with cross-functional teams to ensure the reliability, availability, and performance of our client-facing services Maintain and configure observability platforms such as Datadog Proactive monitoring of production and other environments to ensure stability, availability, security and integrity Design and implement automation and processes to improve the efficiency and effectiveness … DevOps Experience of running 24x7 services in a public cloud, ideally Azure Deep understanding of cloud infrastructure and services, including best practices for monitoring, scaling, and security Experience with observability platforms such as Datadog or similar tools Strong interpersonal skills, with the ability to work effectively with many stakeholders Solid verbal and written communication skills, and the ability to present More ❯
Posted:

Senior Storage Engineer

London Area, United Kingdom
NJF Global Holdings Ltd
scale, data-intensive workloads Implement and maintain DevOps tooling (Terraform, Ansible, GitLab CI/CD, Jenkins) Lead PoCs for new storage technologies and present results to technical leadership Support observability via Grafana, Prometheus, Splunk , and related platforms Contribute to containerization efforts with Docker and Kubernetes (preferred) What We’re Looking For: 8+ years of experience in storage systems administration and … kernel bypass) Strong understanding of Linux performance tuning , particularly in HPC or ML/AI contexts Programming/scripting experience in Python , Golang , or similar languages Familiarity with modern observability and monitoring tools (Grafana, Prometheus, Splunk) Experience supporting AI/ML modelling environments is highly desirable Knowledge of container and orchestration technologies (Docker, Kubernetes) is a plus Proactive, collaborative, and More ❯
Posted:

Senior Storage Engineer

City of London, London, United Kingdom
NJF Global Holdings Ltd
scale, data-intensive workloads Implement and maintain DevOps tooling (Terraform, Ansible, GitLab CI/CD, Jenkins) Lead PoCs for new storage technologies and present results to technical leadership Support observability via Grafana, Prometheus, Splunk , and related platforms Contribute to containerization efforts with Docker and Kubernetes (preferred) What We’re Looking For: 8+ years of experience in storage systems administration and … kernel bypass) Strong understanding of Linux performance tuning , particularly in HPC or ML/AI contexts Programming/scripting experience in Python , Golang , or similar languages Familiarity with modern observability and monitoring tools (Grafana, Prometheus, Splunk) Experience supporting AI/ML modelling environments is highly desirable Knowledge of container and orchestration technologies (Docker, Kubernetes) is a plus Proactive, collaborative, and More ❯
Posted:

DevOps Engineer

Glasgow, Scotland, United Kingdom
ELLIOTT MOSS CONSULTING PTE. LTD
Develop and optimize CI/CD pipelines using GitHub Actions, ensuring fast and reliable software delivery. · Manage containerized applications using Docker, Kubernetes, Amazon EKS, and Helm. · Administer and enhance observability using log aggregation and monitoring tools such as CloudWatch, Splunk, and Datadog. · Maintain and manage artifact repositories (e.g., JFrog Artifactory) and ensure effective dependency management. · Automate and streamline system operations … plus. Requirements: · 3+ years of practical experience with AWS cloud services and infrastructure management. AWS certifications are advantageous. · Strong experience with Infrastructure as Code tools (Terraform, CloudFormation) · Familiarity with observability and monitoring tools (CloudWatch, Splunk, Datadog). · Experience managing CI/CD workflows, especially with GitHub Actions. · Strong knowledge of artifact repository management systems like JFrog. · Proficient in Linux administration More ❯
Posted:

Infrastructure Engineer

Edinburgh, United Kingdom
慨正橡扯
customers consume our products. Additionally, you'll: People manage a team, developing skillsets and capabilities to support strategic outcomes Develop technical skills through continuous learning and development Support strategic observability, maintaining a strong awareness of service, creating operational views of data, and supporting the development of targets for the team to deliver against Provide operational support for product and service … would experience of Python, Terraform, Ansible, and PowerShell. Ideally, you'll also have experience in data centre networking, including software-defined networking. Furthermore, you'll need: Experience of using observability tools and techniques with the ability to use data, information, and user sentiment to continuously improve solutions In depth public cloud vendor knowledge covering GCP, AWS, and Azure, Extensive experience More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Test Environment Manager (Linux/Openshift) - £75k-£85k

London, England, United Kingdom
Hybrid / WFH Options
Vertus Partners
cross-functional teams. Embed environment automation via tools like Ansible and integrate seamlessly with CI/CD pipelines (e.g., Jenkins, GitLab). Monitor environment health and performance using modern observability tools, driving continuous improvement initiatives. What they are looking for At least 5+ years in test environment management, including hands-on work with OpenShift/Linux platforms. Strong exposure to … . Experience working within complex transformation programmes ideally involving legacy-to-modern transitions. Proficient in Linux shell scripting and systems management. Sound knowledge of CI/CD frameworks and observability stacks (e.g., Prometheus, Grafana). Confident communicator, able to collaborate effectively with architects, developers, QA, and operations. The role is paying a base salary of between More ❯
Posted:

Senior Cloud / Platform Engineer Product & Engineering · Belfast ·

London, England, United Kingdom
Hybrid / WFH Options
Kadence Limited
operations. Manage and enhance our container orchestration stack using Kubernetes (EKS) and Docker. Develop and maintain robust, scalable CI/CD pipelines with Jenkins, GitHub Actions, and ArgoCD. Strengthen observability across the platform through effective monitoring, logging, and alerting (AWS services, Grafana, etc). Contribute to platform security through infrastructure hardening, role-based access controls, and infrastructure as code (Terraform … CI/CD pipelines using Jenkins, GitHub Actions, and/or ArgoCD. Familiarity with infrastructure as code practices using Terraform, CloudFormation, or similar tools. A solid grasp of system observability, monitoring, and alerting practices (CloudWatch, Grafana, or equivalent). Exposure to platform security principles including identity/access management, secrets handling, and environment isolation. Strong scripting and automation skills (e.g. … Desktop: Cross platform desktop app built with Electron (TypeScript). Cloud & DevOps: AWS (20+ services), Kubernetes (EKS), Docker, Infrastructure as Code (CloudFormation, Terraform), CI/CD (Jenkins, GitHub Actions), Observability (AWS, Grafana). Development tools: GitHub, Jira, Notion, ChatGPT, Gemini, LangChain, AI-native IDE's (Cursor, JetBrains), LLM-powered internal tools. Test automation: Cypress (E2E), Postman (API), Jest (frontend unit More ❯
Posted:
Observability
10th Percentile
£57,500
25th Percentile
£65,000
Median
£80,000
75th Percentile
£97,500
90th Percentile
£117,500