Observability Jobs in the UK

976 to 1,000 of 2,577 Observability Jobs in the UK

DevOps Engineer - Annalect Labs - OMG UK

London, England, United Kingdom
Hybrid / WFH Options
Realshoreit
secure, reliable and efficient. Your expertise will empower our teams to deliver high-quality software with confidence. Whether it's designing resilient cloud architectures, automating deployments or enhancing system observability, you'll bring a problem-solving mindset and a drive to make everything run seamlessly. Collaboration is at the heart of what we do. You'll work closely with engineers … tools and approaches to improve our operations. If you have extensive experience with AWS and GCP, deep expertise in Infrastructure as Code (IaC), a strong background in monitoring and observability and solid scripting skills in Bash, Python or Go, we'd love to hear from you! About The Agency Omnicom Media Group UK (OMG UK) is the media division of More ❯
Posted:

Lead Infrastructure Architect - Fantastic Opportunity

City of London, London, United Kingdom
Hybrid / WFH Options
UST
domains. With over 20+ years of proven expertise, the ideal candidate will shape the strategy, design, and transformation of complex infrastructure landscapes—including Wintel, Linux, Network, Voice, Collaboration, Mobility, Observability, End-User Computing, End-User Services, and Service Desk. You will lead and drive architecture review boards and provide strategic direction. This role acts as a key advisor to senior … domains: Wintel & Linux platforms Network (LAN/WAN/SD-WAN, Wireless, Firewalls) Unified Communication/Voice/Collaboration (Cisco, MS Teams) Mobility & Endpoint Management (Intune, MDM/UEM) Observability and Monitoring (ELK, Prometheus, AppDynamics, etc.) End-User Computing (VDI, physical endpoints, OS lifecycle) End-User Services and Service Desk (ITSM, automation, FCR, CSAT) Serve as a trusted advisor to More ❯
Posted:

Lead Infrastructure Architect - Fantastic Opportunity

London Area, United Kingdom
Hybrid / WFH Options
UST
domains. With over 20+ years of proven expertise, the ideal candidate will shape the strategy, design, and transformation of complex infrastructure landscapes—including Wintel, Linux, Network, Voice, Collaboration, Mobility, Observability, End-User Computing, End-User Services, and Service Desk. You will lead and drive architecture review boards and provide strategic direction. This role acts as a key advisor to senior … domains: Wintel & Linux platforms Network (LAN/WAN/SD-WAN, Wireless, Firewalls) Unified Communication/Voice/Collaboration (Cisco, MS Teams) Mobility & Endpoint Management (Intune, MDM/UEM) Observability and Monitoring (ELK, Prometheus, AppDynamics, etc.) End-User Computing (VDI, physical endpoints, OS lifecycle) End-User Services and Service Desk (ITSM, automation, FCR, CSAT) Serve as a trusted advisor to More ❯
Posted:

Cloud Data Platform Engineer

England, United Kingdom
BMC Software, Inc
scale event-driven workflows using EventBridge and Lambda. Work with DynamoDB for fast, scalable key-value storage. Develop and maintain Java Spring Boot microservices deployed on EC2 instances. Ensure observability, monitoring, and fault-tolerance across the system. Collaborate with DevOps, Data Engineering, and Product teams to design scalable, cost-effective cloud solutions. Maintain security best practices in a cloud-native … performance tuning, and cost-optimization in cloud environments with Kafka for data streaming. Familiarity with CI/CD and infrastructure-as-code tools (e.g., Terraform, CloudFormation). Experience with observability tools (e.g., CloudWatch, OpenTelemetry). Experience working in a global enterprise software company. Our commitment to you! BMC's culture is built around its people. We have 6000+ brilliant minds More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Site Reliability Engineer

London Area, United Kingdom
Hybrid / WFH Options
Unitary
coming year and beyond! The role We are now looking for a Site Reliability Engineer to ensure our systems run smoothly and reliably at scale. Your expertise in monitoring, observability, and system automation will help maintain the high availability and performance our customers depend on. You will work at the intersection of development and operations, using your technical skills to … Design and implement comprehensive alerting systems that detect issues early and provide actionable insights to streamline the resolution of these issues. Collaborate with our development teams to ensure our observability stack provides clear visibility into system health and performance. Optimise on-call processes, including creating and maintaining detailed runbooks that enable efficient incident response and knowledge sharing across teams. Build More ❯
Posted:

Site Reliability Engineer

City of London, London, United Kingdom
Hybrid / WFH Options
Unitary
coming year and beyond! The role We are now looking for a Site Reliability Engineer to ensure our systems run smoothly and reliably at scale. Your expertise in monitoring, observability, and system automation will help maintain the high availability and performance our customers depend on. You will work at the intersection of development and operations, using your technical skills to … Design and implement comprehensive alerting systems that detect issues early and provide actionable insights to streamline the resolution of these issues. Collaborate with our development teams to ensure our observability stack provides clear visibility into system health and performance. Optimise on-call processes, including creating and maintaining detailed runbooks that enable efficient incident response and knowledge sharing across teams. Build More ❯
Posted:

Staff Software Engineer - Storeops c&h

London, England, United Kingdom
MARKS&SPENCER
fostering an environment of continuous learning and growth, while participating in hiring processes and training engineers up to Staff standard. Operational Stability: Demonstrate a production first attitude, continuously considering observability and maintaining Service Level Objectives, while delivering change at pace. Research & Innovation: Embrace emerging technologies and trends, and share insights with the organisation, while developing and maintaining the team technology …/PostgreSQL) MongoDB Event processing with Kafka CI/CD with GitHub Actions and Azure pipelines Code quality with Sonar Microservice architecture Azure DevOps, Kubernetes, Docker Azure storage, Redis Observability Tools Dynatrace, New Relic Git, GitHub TDD, BDD Kotlin, .NET Android development Reporting built with MS SSRS and PowerBI Security and performance testing and optimisation Everyone's Welcome M&S More ❯
Posted:

Engineering Excellence Lead

London, United Kingdom
Hybrid / WFH Options
Trilitech
Collaborate with People/HR and engineering leadership on career pathing, training, and coaching for engineering staff. Technology Enablement: Evaluate and deploy tools - especially AI - that support engineering productivity, observability, and collaboration. Work closely with DevOps, QA, and SRE teams to align infrastructure and operational excellence with engineering needs. Own key vendor relationships, evaluation of partnerships and represent technology on … scaling engineering orgs across multiple geographies or domains (e.g., front-end, back-end, infrastructure). Familiarity with tools like Linear, Asana, GitHub, Datadog, DORA metrics, or similar performance/observability platforms. Background in organisational change management or engineering program management. What you can expect from us Competitive salary with substantial incentive schemes Generous long-term incentive plan (LTIP) tez token More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Site Reliability Engineer

London, England, United Kingdom
Hybrid / WFH Options
Unitary
coming year and beyond! The role We are now looking for a Site Reliability Engineer to ensure our systems run smoothly and reliably at scale. Your expertise in monitoring, observability, and system automation will help maintain the high availability and performance our customers depend on. You will work at the intersection of development and operations, using your technical skills to … Design and implement comprehensive alerting systems that detect issues early and provide actionable insights to streamline the resolution of these issues Collaborate with our development teams to ensure our observability stack provides clear visibility into system health and performance Optimise on-call processes, including creating and maintaining detailed runbooks that enable efficient incident response and knowledge sharing across teams Build More ❯
Posted:

Principal Site Reliability Engineer - iwoca

London, England, United Kingdom
Jobs via eFinancialCareers
A track record of shaping incident processes, on-call practices, or sharing reliability ownership across multiple teams. Deep understanding of site reliability principles and applying them to databases, including observability and limiting the impact of long-running or resource-heavy queries. Experience with infrastructure automation, like setting up monitoring and alerting for pipelines Bonus: Strong academic background in maths, physics … in tech or open-source communities, with a passion for sharing knowledge and inspiring others. An open mind and the flexibility to approach challenges from different angles. Experience with observability platforms such as DataDog. Experience with managing infrastructure management using Terraform. Familiarity Python, SQL, Go. The salary We expect to pay from £100,000 - £140,000 for this role. But More ❯
Posted:

Restaurant Technology Problem Manager

London, United Kingdom
Hybrid / WFH Options
McDonald's Corporation
as follows: Own ITIL Problem & Change Management Take ownership of ITIL Problem Management activities, proactively identifying, addressing and fixing root causes of incidents and recurring issues within the system. Observability lead, promoting stability across the estate by collaborating with cross-functional teams to implement preventive measures. Actively take part in ITIL Change Management processes, ensuring that changes to the system … efficiently. Experience in implementing changes while following ITIL change management processes. Understanding of basic security principles and best practices for securing infrastructure. Optional but advantageous technical skills: Proficient using observability tools (NewRelic and Thousand Eyes), BI platform and data visualisation tools (such as Tableau and Power BI) and technology tools (Jira, Confluence). System Administration: Proficiency in Linux/Unix More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Principal Site Reliability Engineer

London, England, United Kingdom
iwoca
A track record of shaping incident processes, on-call practices, or sharing reliability ownership across multiple teams. Deep understanding of site reliability principles and applying them to databases, including observability and limiting the impact of long-running or resource-heavy queries. Experience with infrastructure automation, like setting up monitoring and alerting for pipelines Bonus: Strong academic background in maths, physics … in tech or open-source communities, with a passion for sharing knowledge and inspiring others. An open mind and the flexibility to approach challenges from different angles. Experience with observability platforms such as DataDog. Experience with managing infrastructure management using Terraform. Familiarity Python, SQL, Go. The salary We expect to pay from £100,000 - £140,000 for this role. But More ❯
Posted:

Senior to Principal DevOps Engineer

Slough, England, United Kingdom
Hybrid / WFH Options
JR United Kingdom
development, QA, and operations teams to implement DevOps methodologies and toolchains. Use Infrastructure as Code (IaC) with Terraform for automation. Maintain security controls across cloud environments, ensuring compliance. Utilise observability tools to monitor and optimise production services. Design and improve CI/CD pipelines with platforms like GitLab or Jenkins. Mentor and guide DevOps and development teams, promoting continuous learning. More ❯
Posted:

Java Software Engineer

London Area, United Kingdom
Oliver Bernard
developers. Experience with cloud platforms (AWS, GCP, or Azure). A strong security mindset or a keen interest in cybersecurity. Bonus: experience with Kubernetes, CI/CD pipelines, and observability tools. The role will require 5 days a week onsite in London, please apply for immediate consideration. More ❯
Posted:

Java Software Engineer

City of London, London, United Kingdom
Oliver Bernard
developers. Experience with cloud platforms (AWS, GCP, or Azure). A strong security mindset or a keen interest in cybersecurity. Bonus: experience with Kubernetes, CI/CD pipelines, and observability tools. The role will require 5 days a week onsite in London, please apply for immediate consideration. More ❯
Posted:

Senior to Principal DevOps Engineer

City of London, England, United Kingdom
Hybrid / WFH Options
JR United Kingdom
Collaborate with teams to define and implement DevOps methodologies and toolchains. Implement Infrastructure as Code (IaC) using Terraform for automation. Maintain security controls across cloud environments, ensuring compliance. Use observability tools to monitor and optimise performance, resolving issues proactively. Design and optimise CI/CD pipelines with platforms like GitLab or Jenkins. Mentor and guide DevOps and development teams, fostering More ❯
Posted:

Staff Engineer

City of London, London, United Kingdom
Hybrid / WFH Options
Arrows
architecture and development of backend services using C#, ASP.NET, .NET Core Automate infrastructure, CI/CD pipelines, and cloud operations (AWS/Azure) Promote engineering best practices, security, and observability Mentor engineers and foster a culture of continuous improvement Contribute to technology direction, including adoption of tools like Go and Python What We’re Looking For Deep expertise in C# More ❯
Posted:

Staff Engineer

London Area, United Kingdom
Hybrid / WFH Options
Arrows
architecture and development of backend services using C#, ASP.NET, .NET Core Automate infrastructure, CI/CD pipelines, and cloud operations (AWS/Azure) Promote engineering best practices, security, and observability Mentor engineers and foster a culture of continuous improvement Contribute to technology direction, including adoption of tools like Go and Python What We’re Looking For Deep expertise in C# More ❯
Posted:

Site Reliability Engineer (SRE) - Crypto High-Frequency Trading

Slough, England, United Kingdom
JR United Kingdom
paced environment. Responsibilities: Develop scalable tools for automation, deployment, and infrastructure management. Enhance system performance, reliability, and efficiency through automation. Manage AWS infrastructure, ensuring smooth configuration and deployment. Implement observability tools for monitoring and debugging. Ensure fault tolerance, redundancy, and high availability of trading systems. Support infrastructure for C++ and Rust-based trading systems, ensuring seamless integration. Qualifications: Strong programming More ❯
Posted:

Site Reliability Engineer (SRE) - Crypto High-Frequency Trading

London, England, United Kingdom
Selby Jennings
production tools to automate deployment, monitoring, and infrastructure management. Improving system performance, reliability, and efficiency through automation and tooling. Managing AWS-based infrastructure, ensuring seamless configuration and deployment. Implementing observability tools to enhance monitoring, debugging, and performance insights. Ensuring fault tolerance, redundancy, and high availability across critical trading systems. Supporting infrastructure for C++ and Rust-based trading systems, ensuring smooth More ❯
Posted:

Lead Platform Engineer

Croydon, England, United Kingdom
WeDo
autonomy, clean code, and continuous delivery The technical landscape: Azure (AKS, Functions, App Services, Event Grid, etc.) Infrastructure as Code (Terraform) CI/CD using Azure DevOps Monitoring and Observability (Application Insights, Azure Monitor, Prometheus/Grafana) GitHub for version control, and a modern SDLC with automated testing and security baked in What we’re looking for: Someone who can More ❯
Posted:

Staff Software Engineer - AI In-Market Engineering

France Lynch, England, United Kingdom
DDN
Engineers to ensure customer success. Translate technical issues into executive-ready summaries and business impact statements. Participate in post-mortems and executive briefings for strategic accounts. Drive adoption of observability, automation, and self-healing support mechanisms using AI/ML tools. Required Qualifications 8+ years in enterprise storage, distributed systems, or cloud infrastructure support/engineering. Deep understanding of file … diagnostics and reduce MTTR. Preferred Qualifications Experience with DDN, VAST, Weka, or similar scale-out file systems. Strong scripting/coding ability in Python, Bash, or Go. Familiarity with observability platforms: Prometheus, Grafana, ELK, OpenTelemetry. Knowledge of replication, consistency models, and data integrity mechanisms. Exposure to Sovereign AI, LLM model training environments, or autonomous system data architectures. This position requires More ❯
Posted:

DevOps Specialist

Knutsford, Cheshire, United Kingdom
Experis - ManpowerGroup
is required to assist in upgrading the Elastic DP estate to Kubernetes, moving away from obsolete technology (Cloudera), upgrading to RHEL 8, and contributing to improving the stability and observability of the platform. The role also involves providing advanced analytics tooling and services for modeling analytics. Responsibilities include: Supporting production application support in AWS, with experience in incident and change More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Senior Java Software Engineer

City of London, London, United Kingdom
Hybrid / WFH Options
Inara
and accelerate platform delivery Deploy and monitor services in AWS using Kubernetes Work in a high-frequency release environment — deploying multiple times per day Use Grafana (or similar) for observability and maintain production-grade reliability Work onsite 3 days/week in London for the first 4–6 weeks (hybrid flexibility beyond this) We’re Looking For: 5+ years of More ❯
Posted:

Senior Java Software Engineer

London Area, United Kingdom
Hybrid / WFH Options
Inara
and accelerate platform delivery Deploy and monitor services in AWS using Kubernetes Work in a high-frequency release environment — deploying multiple times per day Use Grafana (or similar) for observability and maintain production-grade reliability Work onsite 3 days/week in London for the first 4–6 weeks (hybrid flexibility beyond this) We’re Looking For: 5+ years of More ❯
Posted:
Observability
10th Percentile
£57,500
25th Percentile
£65,000
Median
£80,000
75th Percentile
£97,500
90th Percentile
£120,000