Observability Jobs in the UK

376 to 400 of 895 Observability Jobs in the UK

Backend Team Lead (Python, Go, Data Systems)

United Kingdom
Hybrid/Remote Options
StackGuardian
Design and optimize large-scale data systems powering workflows, policy engines, analytics, and automation features. Own Reliability, Security, and Performance: Ensure backend services meet enterprise-grade standards for quality, observability, access control, and scalability. Grow and Mentor the Engineering Team: Lead a group of backend and data engineers. Conduct code reviews, establish engineering best practices, and foster a culture of More ❯
Posted:

Senior ML Engineer

London, United Kingdom
Hybrid/Remote Options
Method-Resourcing
teams to operationalize models and ship ML-powered features into production. Continuously assess and iterate on production models, balancing long-term ML strategy with tactical improvements. Champion code quality, observability, and resilience within their ML systems through reviews and hands-on contributions. Help shape their internal ML standards and practices, ensuring they stay ahead of industry advancements. Offer technical mentorship More ❯
Employment Type: Permanent, Work From Home
Posted:

Senior ML Engineer

London, South East, England, United Kingdom
Hybrid/Remote Options
Method Resourcing
teams to operationalize models and ship ML-powered features into production. Continuously assess and iterate on production models, balancing long-term ML strategy with tactical improvements. Champion code quality, observability, and resilience within their ML systems through reviews and hands-on contributions. Help shape their internal ML standards and practices, ensuring they stay ahead of industry advancements. Offer technical mentorship More ❯
Employment Type: Full-Time
Salary: £150,000 - £160,000 per annum
Posted:

ML Infrastructure Engineer

London Area, United Kingdom
Hybrid/Remote Options
Cubiq Recruitment
including work on caching, I/O, and data locality across compute and storage Benchmark, profile, and fix performance issues across compute, network, and orchestration layers Set up clear observability, resilience, and security controls for sensitive research environments Work with Research, Data, and Applied teams to plan GPU and storage capacity and support smoother ML experimentation Technical Skills: Strong experience More ❯
Posted:

ML Infrastructure Engineer

City of London, London, United Kingdom
Hybrid/Remote Options
Cubiq Recruitment
including work on caching, I/O, and data locality across compute and storage Benchmark, profile, and fix performance issues across compute, network, and orchestration layers Set up clear observability, resilience, and security controls for sensitive research environments Work with Research, Data, and Applied teams to plan GPU and storage capacity and support smoother ML experimentation Technical Skills: Strong experience More ❯
Posted:

Engineering Manager

Manchester, England, United Kingdom
Hybrid/Remote Options
Suits Me
grow technically and professionally Owning delivery of critical platform services that power Suits Me's financial products Overseeing the full development lifecycle, from architectural design and planning to deployment, observability, and continuous improvement Establishing engineering standards, including code quality, CI/CD, testing, and documentation practices Driving adoption of best practices around API design, data modelling, and event-driven architectures More ❯
Posted:

Director of Software Engineering

City of London, London, United Kingdom
Hybrid/Remote Options
Tech-Ninjas Consultants
Cloud-native microservices stack running on public cloud (AWS preferred). Modern Java ecosystem with strong focus on clean code and maintainability. Mission-critical, always-on platforms with strong observability and incident practices. Collaborative product, engineering and design culture with genuine influence for senior engineers. Hiring process (via Tech Ninjas Consultants) Intro call with Tech Ninjas Conversation with hiring manager More ❯
Posted:

Director of Software Engineering

London Area, United Kingdom
Hybrid/Remote Options
Tech-Ninjas Consultants
Cloud-native microservices stack running on public cloud (AWS preferred). Modern Java ecosystem with strong focus on clean code and maintainability. Mission-critical, always-on platforms with strong observability and incident practices. Collaborative product, engineering and design culture with genuine influence for senior engineers. Hiring process (via Tech Ninjas Consultants) Intro call with Tech Ninjas Conversation with hiring manager More ❯
Posted:

Senior Software Engineer

England, United Kingdom
Hybrid/Remote Options
Elliptic Enterprises Ltd
or experience in natural language to SQL functionality or experimentations with AI-assisted querying Experience with Terraform or other IaC tools Familiarity with monitoring tools like DataDog and general observability best practices Knowledge of data visualisation principles or libraries Interest in cryptocurrency and blockchain technology Job Benefits How we work: Hybrid working and the option to work from almost anywhere More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Software Engineering Manager - Online (Order Management)

England, United Kingdom
Marks & Spencer Plc
cloud, and software engineering standard methodologies Promoter of DevOps: you build it, you run it. Tech Stack Java, Spring, SpringBOOT, Micronaut React, Next.js, Typescript, Angular Azure Cloud, Kubernetes, Dynatrace (observability) SQL Server, MongoDB Ignite, Redis What's In It For You Working at M&S means being part of something bigger - helping to deliver quality, value and service to millions More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Technical Lead

North West, England, United Kingdom
Hybrid/Remote Options
Perch Group
Functions, Service Bus) · ETL pipeline development using Azure Data Factory · Exposure to DataBricks, Synapse or Spark · Experience working within event-driven architectures · Understanding of DevOps, IaC (Terraform/Bicep), Observability We’re looking for a strong engineer who enjoys working across both data and application development, someone who’s motivated by solving challenging problems - not just writing code. You’ll More ❯
Posted:

Senior Cloud Engineer

Birmingham, England, United Kingdom
Hybrid/Remote Options
EML
deployment processes with a focus on minimizing security risks. Site Reliability Engineering (SRE): Ensure system reliability, scalability, and performance through proactive monitoring and secure incident response. Develop and implement observability tools to monitor system health, detect anomalies, and identify security threats. Perform root cause analysis and implement solutions to prevent recurring issues, including security vulnerabilities. Define and measure Service Level More ❯
Posted:

Software Engineering Manager - Identity

London, United Kingdom
Marks & Spencer Plc
deployment pipelines to enhance efficiency and reliability. Quality, Stability & Standards: Establish quality standards to meet performance, reliability, and maintainability of the systems. With a strong production first mindset, drive observability, maintain Service Level Objectives (SLOs), and ensure efficient incident resolution. Oversee the maintenance of existing systems, ensuring continuous improvements and prompt resolution of issues. Agile Delivery & Collaboration: Working closely with More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Lead DevOps Architect

London, United Kingdom
Stott & May Professional Search Limited
week) Day Rate: Market rate (Inside IR35) Contract Duration: 6 months Role Summary We are looking for an experienced DevOps Lead/Architect to design, implement, and maintain scalable observability and cloud infrastructure. The successful candidate will embed reliability, performance, and automation into our systems, champion GitOps practices, and provide technical leadership to deliver robust, high-availability solutions. Key Responsibilities … Architect and maintain observability platforms using Datadog and Geneos for comprehensive monitoring and alerting. Design and manage scalable cloud infrastructure using Terraform and IaC principles. Implement and promote GitOps workflows with GitLab for CI/CD and deployment automation. Collaborate with engineering teams to ensure reliability, scalability, and performance in software delivery. Optimise alerting and monitoring strategies to improve actionable … insights. Mentor junior engineers and contribute to SRE best practices. Lead incident response, perform root cause analysis, and drive continuous improvement. Oversee cloud cost management for observability infrastructure in line with Cloud FinOps principles. Essential Skills/Knowledge/Experience Strong hands-on experience with AWS services (EC2, S3, RDS, Lambda, VPC, IAM, CloudWatch, EKS, ECS). Expertise in Infrastructure More ❯
Employment Type: Contract
Rate: £750 - £800 per day
Posted:

Site Reliability Engineer

Herefordshire, West Midlands, United Kingdom
Hybrid/Remote Options
itecopeople
complex environments. Key Responsibilities Partner with Software Engineers to enhance system reliability, scalability, and performance. Collaborate with System Administrators to automate repetitive tasks and streamline alerts. Advance monitoring and observability practices to identify and resolve issues before they affect users. Support development and testing environments to help meet delivery and quality objectives. Research, evaluate, and recommend tools and technologies to … as code. Expertise with containerisation and orchestration (Docker, Kubernetes, OpenShift, or Swarm). Skilled in CI/CD pipeline tools (e.g. Jenkins, GitLab CI). Proficient with monitoring and observability tools (Grafana, Prometheus, InfluxDB). Experience integrating event-driven systems using MQ solutions (RabbitMQ or similar). Strong knowledge of SQL and relational databases . Advanced Linux administration and shell … Desirable Skills Programming experience in Java, Go, or Python . Understanding of cross-domain technologies and security models. Background in service management environments and ITIL practices. Proven application of observability patterns and system health metrics. Experience with Microsoft Azure cloud services. For more information, send your CV to Ryan at Services Advertised are those of an Employment Business More ❯
Employment Type: Contract
Rate: £500.0 - £600.0 per day
Posted:

Senior Software Engineer

Merseyside, England, United Kingdom
GBV Ltd
scalable REST APIs . Build queue-based, event-driven processes for high-volume data workflows. Collaborate across DevOps, Infrastructure, and Product teams. Monitor and optimise system performance using modern observability tools. Explore practical applications of AI-driven development and GPT-based engineering workflows . 💡 Tech Stack & Skills Core: ASP.NET Core (C#) or Golang - would be really nice to have if … E-Commerce: Magento or similar open-source platforms Architecture: Microservices, SOA, Hexagonal, Modular Monolith Infrastructure: Docker, AWS (preferred), Azure or GCP Eventing & Queues: Message-based architecture for async workflows Observability: Grafana, Prometheus, CloudWatch, New Relic, Datadog Bonus points for: VB.NET or Kubernetes exposure Solid grounding in software architecture patterns (DDD, Clean Code, 12-Factor App) 🌍 Culture & Behaviours Pace: Move fast More ❯
Posted:

Data DevOps Engineer

St Albans, England, United Kingdom
Addition+
Oversee data pipelines and big data workflows (EMR, Spark) for high-performance analytics. Optimize code for ETL and Power BI (DAX, data models, refresh scheduling) to enhance performance. Implement observability and logging (CloudWatch, Grafana, ELK) for proactive system monitoring. Collaborate cross-functionally with BI, Platform, and Data teams on releases and issue resolution. Enforce security & compliance (RBAC, encryption, GDPR/… on with Docker and Kubernetes; experienced in scalable, portable BI and data environments. Environment Management: Managed Dev/QA/UAT freshness, data synchronisation, and Jira-integrated release workflows. Observability & Monitoring: Implemented CloudWatch, Datadog, Prometheus, and Grafana for logging, metrics, and alerting. Troubleshooting & Problem Solving: Strong analytical and cross-functional collaboration skills; effective under pressure. Project Delivery: Managed multiple concurrent More ❯
Posted:

Lead Engineer

City of London, London, United Kingdom
Hybrid/Remote Options
Sanderson
AWS services (SNS, SQS, Lambda, DynamoDB). Drive automation across CI/CD pipelines using tools like GitHub Actions , Terraform , and Argo CD for seamless and secure deployments. Enhance observability using Prometheus , Grafana , Datadog , and CloudWatch , enabling proactive incident prevention. Own incident management and post-mortem practices — guiding the team through challenges calmly and driving meaningful improvement. Collaborate with global … Terraform, Ansible) and CI/CD automation (GitHub Actions, Jenkins, Harness). Familiarity with messaging, caching, and database systems — Kafka, Redis, MongoDB, Cassandra, PostgreSQL. Hands-on experience in monitoring, observability, and incident response frameworks using modern tooling. Strong leadership, mentoring, and stakeholder management skills — able to scale teams, set OKRs, and foster engineering excellence. An ability to remain composed, analytical More ❯
Posted:

Lead Engineer

London Area, United Kingdom
Hybrid/Remote Options
Sanderson
AWS services (SNS, SQS, Lambda, DynamoDB). Drive automation across CI/CD pipelines using tools like GitHub Actions , Terraform , and Argo CD for seamless and secure deployments. Enhance observability using Prometheus , Grafana , Datadog , and CloudWatch , enabling proactive incident prevention. Own incident management and post-mortem practices — guiding the team through challenges calmly and driving meaningful improvement. Collaborate with global … Terraform, Ansible) and CI/CD automation (GitHub Actions, Jenkins, Harness). Familiarity with messaging, caching, and database systems — Kafka, Redis, MongoDB, Cassandra, PostgreSQL. Hands-on experience in monitoring, observability, and incident response frameworks using modern tooling. Strong leadership, mentoring, and stakeholder management skills — able to scale teams, set OKRs, and foster engineering excellence. An ability to remain composed, analytical More ❯
Posted:

Team Lead - Site Reliability Engineering

London, United Kingdom
Arbuthnot Latham
skills and expertise to automating manual tasks (TOIL) in such areas as incident management, problem management, change management, and release management tasks, and provides operational insights through monitoring and observability; and other aspects involved in preparing and optimising automated delivery solutions. To place the interests of customers at the centre of all activities, act in a way that is consistent … a root cause analysis to troubleshoot priority incidents. Implement automation to reduce probability and/or impact of problems recurring possible options could include automated incident response, enhanced monitoring, observability initiatives, automation to change and release management . Identify, evaluate, and recommend monitoring and observability tools and diagnostic techniques to improve system observability and insights, including identification of requirements, nonfunctional … environments Experience of communicating complex issues to senior stakeholders and technical teams. Implementation of highly available and reliable systems, using multi-AZ and multiregional approaches Expertise with monitoring and observability tools (e.g. SolarWinds, Datadog, Azure/AWS native tools) Expertise with SLI/SLO management tools such as (ServiceNow) Expertise with Incident ticketing and change management systems such as (ServiceNow More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Site Reliability Engineer

United Kingdom
Hybrid/Remote Options
Halian Technology Limited
We're Hiring: Site Reliability Engineer (SRE) Fully Remote (UK-based candidates) Permanent Role Supporting our US office Join a high-impact SRE team focused on automation, observability, and scaling infrastructure to support millions of users. Tech Stack Highlights Java Kotlin C++ Postgres AWS (EC2, ECS, Fargate, Route53) New Relic Splunk DataDog Terraform Helm Kubernetes Microservices Wh click apply for More ❯
Employment Type: Permanent
Salary: GBP 90,000 Annual
Posted:

Platform Specialist

Knutsford, England, United Kingdom
Hybrid/Remote Options
Undisclosed
Role Title: Observability and telemetry Engineer Duration: contract to run until 31/12/2025 Location: Knutsford, Hybrid 2/3 days per week onsite Rate: up to £368 p/d Umbrella inside IR35 Key Skills/requirements EaaS Evolution Working Experience on PHP or Python Knowledge of Oracle and other relational Databases Well versed in working on More ❯
Posted:

Product Engineer | Fully Remote (UK) | TypeScript, Next,js, React

Edinburgh, Midlothian, Scotland, United Kingdom
Hybrid/Remote Options
Wilson Brown Limited
Tailwind, Shadcn Design APIs and backend services (Node/Python, Prisma, Postgres, Redis, NoSQL) Deliver responsive, accessible, user-friendly UI Write automated tests (Playwright, Jest) Manage deployments, performance, and observability Leverage AI tools to deliver innovative features Comfortable across UIs, APIs, and infrastructure (AWS/GCP) Startup-minded, collaborative, and product-focused Details Up to £70,000 Remote-first (UK More ❯
Employment Type: Permanent, Work From Home
Salary: £70,000
Posted:

Staff Software Engineer

London Area, United Kingdom
Hybrid/Remote Options
KE Technology
focusing on scale, performance, and reliability. Why You’ll Love It Build and optimise real-time distributed systems at a global scale Lead deep dives into latency, throughput, and observability Stay close to the code while shaping architecture and direction Be part of an engineering-led culture with standout benefits: Full private health insurance Extended maternity and paternity leave In More ❯
Posted:

Staff Software Engineer

City of London, London, United Kingdom
Hybrid/Remote Options
KE Technology
focusing on scale, performance, and reliability. Why You’ll Love It Build and optimise real-time distributed systems at a global scale Lead deep dives into latency, throughput, and observability Stay close to the code while shaping architecture and direction Be part of an engineering-led culture with standout benefits: Full private health insurance Extended maternity and paternity leave In More ❯
Posted:
Observability
10th Percentile
£56,718
25th Percentile
£67,500
Median
£80,000
75th Percentile
£105,000
90th Percentile
£139,750