Observability Jobs in the UK

1,001 to 1,025 of 2,210 Observability Jobs in the UK

Principal AWS Platform Engineer

London, England, United Kingdom
CACI Ltd
at scale, leveraging AWS Organizations, Landing Zones, and multi-account best practices. Develop and maintain Infrastructure as Code solutions using Terraform, CloudFormation, and AWS CDK. Champion security, compliance, and observability by integrating services like AWS Security Hub, GuardDuty, and Inspector. Design CI/CD pipelines to enable seamless deployments and self-service models for customers. Innovate with AWS Networking, KMS … Proficiency in Python, Go, or similar languages for automation and scripting. Expert-level knowledge of AWS Networking, TLS, and security best practices. Experience with container orchestration (Kubernetes, EKS) and observability tools (Grafana, ELK). A passion for innovation, problem-solving, and delivering high-impact solutions. Experience leading/managing junior engineers Significant experience with Control Tower and deploying landing zones. More ❯
Posted:

Platform Technical Lead

London, United Kingdom
Sage Global Services Limited
automation, and operationof theenterprise data platform, ensuring its capabilities align with business needs. The platform is built onAzure, Snowflake, Data Lakes, and Kafka, requiring expertise acrosssecurity, data governance, integrations, observability, DevOps, and automation. This is a London-based role, as regular on-site client engagement is required. The position will behired by Marionete, a leader in delivering cutting-edge data … ingestion, storage, processing, and consumption). Drive the implementation ofmodern platform architecturessuch asLakehouse, Kappa, and Lambda, ensuring alignment with industry best practices. Overseeend-to-end platform capabilities, includingsecurity, monitoring, observability, automation, and governance. ImplementDevOps and automation practices, ensuring a highly available, resilient, and self-service platform for data engineers and consumers. Ensure seamlessdata ingestion and processing pipelines, leveragingKafka, Fivetran, Snowflake More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Platform Engineering Manager

London, England, United Kingdom
Hybrid / WFH Options
MOO
building and operating the infrastructure, tooling, and systems that empower our engineering teams to deliver high-quality software quickly and safely. We enable everything from deployments and environments to observability and cloud architecture – acting as the internal backbone of MOO's tech stack. We’re hiring a Platform Engineering Manager to lead our team as we transition from Kubernetes to … building and operating the infrastructure, tooling, and systems that empower our engineering teams to deliver high-quality software quickly and safely. We enable everything from deployments and environments to observability and cloud architecture – acting as the internal backbone of MOO's tech stack. We’re hiring a Platform Engineering Manager to lead our team as we transition from Kubernetes to … practices Improve developer productivity through thoughtful tooling, automation, and streamlined deployment processes Partner with engineering teams to identify pain points and drive continuous improvements Maintain high standards of reliability, observability, and operational excellence Collaborate with architecture, product, and data teams to ensure platform alignment with business needs Manage vendor relationships with AWS and other infrastructure/tooling providers About You More ❯
Posted:

Site Reliability Engineer III - Corporate Oversight and Governance Technology

London, England, United Kingdom
ZipRecruiter
your team, or the wider COGT engineering community. Demonstrate site reliability culture, principles, and practices daily, and champion the adoption of site reliability engineering. Collaborate to create and implement observability and reliability designs for complex systems that are robust, stable, and do not incur additional toil or technical debt. Design, create, and advocate for SRE products that can scale the … complex coding problems. Maintain and promote best practices in software engineering, leading by example. Required Qualifications, Capabilities, and Skills Applied experience with SRE concepts, strategies, and culture. Knowledge of observability tools such as OTEL, Grafana, Dynatrace, Prometheus, Datadog, Splunk, including monitoring, SLAs, alerting, and telemetry collection. Proven experience with Java, Spring Boot. Competency in at least one programming language: Go More ❯
Posted:

DevOps Engineer - Annalect Labs - OMG UK

London, England, United Kingdom
Hybrid / WFH Options
Devaney Consulting, LLC
secure, reliable and efficient. Your expertise will empower our teams to deliver high-quality software with confidence. Whether it's designing resilient cloud architectures, automating deployments or enhancing system observability, you'll bring a problem-solving mindset and a drive to make everything run seamlessly. Collaboration is at the heart of what we do. You'll work closely with engineers … tools and approaches to improve our operations. If you have extensive experience with AWS and GCP, deep expertise in Infrastructure as Code (IaC), a strong background in monitoring and observability and solid scripting skills in Bash, Python or Go, we'd love to hear from you! About the Agency: Omnicom Media Group UK (OMG UK) is the media division of More ❯
Posted:

AI Solution Architect (Agentic & Autonomous Systems)

London, England, United Kingdom
Hybrid / WFH Options
Staffworx
operate autonomously within production environments, integrating LLMs, multi-agent workflows, cloud-native infrastructure and real-time API interfaces. Experience translating ML models into resilient cloud applications, optimizing for performance, observability and secure operations at scale. Core Responsibilities Architect distributed agentic systems using LLMs and tool-using AI components across enterprise cloud environments Design and implement modular, event-driven architectures (e.g. … SageMaker , Bedrock or OpenAI APIs Build support systems for autonomous agents including memory storage, vector search (e.g., Pinecone, Weaviate) and tool registries Enforce system-level requirements for security, compliance, observability and CI/CD Drive PoCs and reference architectures for multi-agent coordination , intelligent routing and goal-directed AI behavior Contribute to internal standards for scalable AI deployments, model governance More ❯
Posted:

DevOps Engineer

Surrey, England, United Kingdom
Hybrid / WFH Options
Switch Tech Talent
AKS | Terraform | PowerShell | Azure DevOps Join a dynamic team at the core of our platform, where you’ll drive the design and automation of deployment pipelines and elevate our observability capabilities. This role demands strong expertise in Azure and Kubernetes (AKS) to support and optimize production environments. What You’ll Be Doing: Manage and enhance Azure cloud infrastructure to ensure More ❯
Posted:

Software Engineer - Site Reliability Engineering

London, England, United Kingdom
Neo4j
as a product feature: Help teams define and act on SLIs and SLOs, turning reliability into a shared, data-driven priority across engineering. Create signals, not noise: Shape an observability stack that tells us what matters, when it matters—so we can detect issues early and resolve them quickly. We're interested in hearing from Engineers with deep experience in … in real-world environments: defining SLIs and SLOs, reducing toil through automation, and driving reliability through engineering. Collaborating with other teams to promote SRE thinking—educating on principles like observability, ownership, and service level objectives. Troubleshooting large-scale, cloud-based systems with confidence and curiosity. Monitoring distributed systems and understanding their performance characteristics. Designing systems with reliability, safety, and debug … ability as first-class concerns. Working with observability tools like OTel Collector, Prometheus, Grafana, and Google Cloud’s operations suite. Deploying and managing applications on Kubernetes; cluster-level administration is a plus. Managing infrastructure with Kustomize and Terraform—keeping it clear, modular, and easy to evolve. Building and maintaining CI/CD workflows—ours run on GitHub Actions. Participating in More ❯
Posted:

Senior Product Manager - Payments Product and Innovation · London ·

London, England, United Kingdom
Collinson Group
solutions across digital and physical touchpoints. You will own and optimize Collinson’s internal payment systems while managing key external partnerships with PSPs, Acquirers, payment orchestration, fraud prevention, and observability providers . In addition, you will oversee payment risk and fraud management , ensuring regulatory compliance and enhancing payment security. Leading a high-performing product team , you will drive innovation, alignment … with orchestration platforms to streamline global payment routing, retries, and conversion optimization . Integrate with fraud prevention providers , implementing real-time risk assessment and fraud mitigation tools. Work with observability partners to ensure real-time monitoring, reporting, and payment analytics for proactive issue resolution. Payment Risk & Fraud Management Oversee payment security, fraud prevention, and risk mitigation strategies across all payment More ❯
Posted:

Staff Engineer, Platform Infrastructure Engineering

London, England, United Kingdom
Equinix
on-boarding applications and tools to the Kubernetes platform infrastructure Suggest and develop improvements as well as carry out maintenance to the Kubernetes platform, CI/CD tools, and Observability systems Participate in an on-call/support rota to respond to critical production impacting incidents Suggest and implement improvements to existing manual processes promoting and enabling self-service for … use Go, Perl, Java and Shell scripting Solid understanding of data structures and algorithms Experience with Secrets Management processes and tooling, such as Hashicorp Vault Experience with Monitoring and Observability of Applications, Infrastructure, and Networks (gNMI, BMP, SNMP, Syslog, Telegraf, etc) Experience with CI/CD and VCS tooling (Jenkins, Gitea, ArgoCD) Experience with Networking and Routing is desirable Equinix More ❯
Posted:

Software Engineer - Site Reliability Engineering

London, England, United Kingdom
Neo4j Inc
as a product feature: Help teams define and act on SLIs and SLOs, turning reliability into a shared, data-driven priority across engineering. Create signals, not noise: Shape an observability stack that tells us what matters, when it matters—so we can detect issues early and resolve them quickly. We're interested in hearing from Engineers with deep experience in … in real-world environments: defining SLIs and SLOs, reducing toil through automation, and driving reliability through engineering. Collaborating with other teams to promote SRE thinking—educating on principles like observability, ownership, and service level objectives. Troubleshooting large-scale, cloud-based systems with confidence and curiosity. Monitoring distributed systems and understanding their performance characteristics. Designing systems with reliability, safety, and debug … ability as first-class concerns. Working with observability tools like OTel Collector, Prometheus, Grafana, and Google Cloud’s operations suite. Deploying and managing applications on Kubernetes; cluster-level administration is a plus. Managing infrastructure with Kustomize and Terraform—keeping it clear, modular, and easy to evolve. Building and maintaining CI/CD workflows—ours run on GitHub Actions. Participating in More ❯
Posted:

Site Reliability Engineer III - Corporate Oversight and Governance Technology

London, England, United Kingdom
J.P. MORGAN
or the wider COGT engineering community. Demonstrates site reliability culture, principles and practices every day and champions the adoption of site reliability. Collaborates with others to create and implement observability and reliability designs for complex systems that are robust, stable, and do not incur additional toil or technical debt. Collaborate in the design, creation and advocacy of SRE products that … Maintain and promote best practices in software engineering, leading by example. Required qualifications, capabilities, and skills. Demonstrable applied experience of SRE concepts, strategies, and culture. Knowledge and experience in observability such as white and black box monitoring, service level objectives, alerting, and telemetry collection using tools such as OTEL, Grafana, Dynatrace, Prometheus, Datadog, Splunk, etc. Proven experience with Java, Springboot More ❯
Posted:

Hands on Engineering Manager

London, England, United Kingdom
Hybrid / WFH Options
ZipRecruiter
with Typescript and React,JS Leadership experience Experience with cloud infrastructure (Azure/AWS/GCP) Proven ability to implement development best practices, including CI/CD, testing, and observability, cloud architecture etc Plus if you have experience with Fintech or B2B SaaS Benefits: Hybrid Working Bonus Meaningful equity Medical insurance Death in service + Others! #J-18808-Ljbffr More ❯
Posted:

Senior Software Engineer, MLOps and Infrastructure

London, England, United Kingdom
Cohere
shape the future! Why this team? This team is responsible for building world-class infrastructure that is critical to all of Cohere's success. Focus on stability, scalability, and observability are all paramount as this work acts as the foundation for all members of technical staff. Our team optimizes for a wide range of technical skillsets (some of which are … Senior Software Engineer you will: Build self-service systems that automate managing, deploying and operating services. This includes our custom Kubernetes operators that support language model deployments. Automate environment observability and resilience. Enable all developers to troubleshoot and resolve problems. Take steps required to ensure we hit defined SLOs, including participation in an on-call rotation. Build strong relationships with More ❯
Posted:

Solutions Engineer - UK Public Sector

London, England, United Kingdom
Splunk Inc
a safer and more resilient digital world with an end-to-end full stack platform made for a hybrid, multi-cloud world. Leading enterprises use our unified security and observability platform to keep their digital systems secure and reliable. Our customers love our technology, but it's our caring employees that make Splunk stand out as an amazing career destination. … IT architecture concepts such as High Availability, Disaster Recovery. Highly Desirable Knowledge and Experience; I have some or all of these too: Domain knowledge in any of: security operations, observability, DevOps, IT operations, big data or log management. Experience writing and using regular expressions. Experience coding in Python. Experience working with REST APIs. Experience with container and container orchestration technology. More ❯
Posted:

DevOps Engineer

England, United Kingdom
Hybrid / WFH Options
Talent
environment Automating everything – CI/CD pipelines, monitoring, infrastructure provisioning Collaborating with developers and other engineers to streamline deployments and workflows Managing AWS services Championing Infrastructure as Code (IaC), observability, and DevOps principles Strong experience with AWS services and hands-on Terraform scripting Deep understanding of DevOps culture , CI/CD pipelines, and container technologies (Docker, Kubernetes a plus) Solid More ❯
Posted:

Principal Architect

Edinburgh, Scotland, United Kingdom
Wood Mackenzie
and repeatable patterns. Assist with solving complex technical problems when they arise through the methodical application of solution knowledge. Ensure software meets requirements of quality, security, extensibility, maintainability, and observability Develop architecture roadmaps aligned with long term product roadmaps. About You A Bachelor’s/Master’s degree in computer science/engineering or related experience. Excellent knowledge and practice … development and architecture. Expectations Experience influencing technical decisions across the different stakeholder levels of the business including non-technical audiences. Ability to foster a culture around data-driven reliability, observability, monitoring, and automation. Due to the global nature of the team, a degree of flexible working will be required to accommodate different time zones. Equal Opportunities We are an equal More ❯
Posted:

DevOps Engineer (Security Operations)

London, England, United Kingdom
Skin Analytics
Build Docker-first workflows with image scanning, tagging, and artifact management. Write and own SOPs for secure deployment and incident response aligned to ISO 27001 and IEC 62304. Extend observability through CloudWatch/ELK stack dashboards, anomaly detection, and alerting for security and performance monitoring Support Transformation team by resolving any security queries that clients might have in their onboarding … into all pipelines with automated alerts and reporting 6 months Mature pipelines to support automated tests, security gates, and gated deploys across all services 12 months Implement full-stack observability with anomaly alerts and dashboards for security and reliability using the ELK stack Requirements Have deep expertise in: Cloud Infrastructure (AWS): EC2, S3, RDS, IAM, VPC, CloudWatch, CloudTrail, Lambda, SQS More ❯
Posted:

Head of Platform Engineering

Faringdon, England, United Kingdom
Hybrid / WFH Options
Kroo Bank
mission is to create a secure, compliant, and efficient environment for product engineers across multiple domains—including cloud infrastructure, shared services, mobile and web platforms, developer experience, security engineering, observability, and reliability. By leveraging both external best practices and in-house innovations, you will drive efficiencies and scalability as our operations grow. Requirements Platform Strategy & Leadership: Collaborate closely with the … industry best practices and automated security tooling (e.g., IAM, RBAC, SIEM, IaC scanning, IDS/IPS) Operational Excellence: Own the maintenance and evolution of existing shared services, ensuring robust observability, reliability, and performance Champion the automation and abstraction of infrastructure dependencies to streamline product delivery Play a technical leadership role in the most complex and demanding projects, balancing strategic oversight More ❯
Posted:

Staff Engineer

London, England, United Kingdom
Majorplayers.co.uk
you'll do: Lead the technical migration from .NET Core to Golang Architect and scale high-performance systems in a cloud-native environment Drive DevOps maturity - CI/CD, observability, and infrastructure as code (Terraform) Mentor engineers and collaborate with product and business stakeholders Champion engineering excellence and guide tech decisions with long-term impact What you'll bring: Deep More ❯
Posted:

DevOps Engineer

England, United Kingdom
Damia Group
able to build new DevOps pipelines AWS S3 RDS Route 53 IAM EKS Secrets Manager ECR Kubernetes Helm Kops Ingress/Egress Terraform Deployment of AWS Resources Pipelines OCI Observability ELK Dynatrace Prometheus Others Vault RedHat Skills working in a secure environment and ability to adhere to security principles Experience in support organisation More ❯
Posted:

Engineering Operations Manager

London, United Kingdom
Hybrid / WFH Options
Trili
Collaborate with People/HR and engineering leadership on career pathing, training, and coaching for engineering staff. Technology Enablement: Evaluate and deploy tools - especially AI - that support engineering productivity, observability, and collaboration. Work closely with DevOps, QA, and SRE teams to align infrastructure and operational excellence with engineering needs. Own key vendor relationships, evaluation of partnerships and represent technology on … scaling engineering orgs across multiple geographies or domains (e.g., front-end, back-end, infrastructure). Familiarity with tools like Linear, Asana, GitHub, Datadog, DORA metrics, or similar performance/observability platforms. Background in organisational change management or engineering program management. What you can expect from us Competitive salary with substantial incentive schemes Generous long-term incentive plan (LTIP) tez token More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Senior DevOps Engineer

London, England, United Kingdom
Hybrid / WFH Options
JR United Kingdom
DevOps team, driving technical excellence and infrastructure optimisation. You will be responsible for the architecture, costing, management, and scaling of their AWS-based infrastructure, CI/CD pipelines, and observability stack, while supporting their AI and data engineering teams in building NLP and ETL pipelines. Responsibilities: 5+ years in a DevOps or similar role with extensive experience managing AWS-based More ❯
Posted:

Infrastructure Engineer (f/m/d)

London, England, United Kingdom
Contentful
and optimize Kubernetes clusters and help ensure that workloads scale and recover gracefully. You’ll collaborate across domains and teams to enable infrastructure improvements. You’ll work to improve observability at scale. You’ll help us stay ahead of operational risks - anticipating bottlenecks or failure points before they cause problems. You’ll be involved in incident response and postmortems. What … balancing. Familiarity with CDN/Edge (e.g., Fastly) and how they interact with backend systems. Comfortable debugging distributed systems issues across Edge, Network, Compute, and Storage layers. Experience with observability stacks (metrics, logs, tracing) and tools like Splunk and New Relic. Familiarity with SRE practices: SLO, SLA, etc. Excellent English communication skills, verbal and written (German not required). A More ❯
Posted:

Senior Lead Software Engineer

City of London, London, United Kingdom
Denu Recruit
and performance optimisation. Passion for clean code, scalable architecture, and elegant problem-solving. Strong communication skills with the ability to align technical direction with business needs. Experience with DevOps, observability, and security compliance is a big plus. Why Apply? Autonomy, impact, and the chance to shape engineering culture. A bold, curious, no-nonsense team solving real customer problems. 📍 UK-based More ❯
Posted:
Observability
10th Percentile
£57,500
25th Percentile
£65,000
Median
£80,000
75th Percentile
£97,500
90th Percentile
£120,000