Observability Jobs in the UK

751 to 775 of 868 Observability Jobs in the UK

AI Architect

City of London, London, United Kingdom
Tata Consultancy Services
If you need support in completing the application or if you require a different format of this document, please get in touch with at UKI.recruitment@tcs.com or call TCS London Office number 02031552100/+44 204 520 2575 with the More ❯
Posted:

AI Architect

London Area, United Kingdom
Tata Consultancy Services
If you need support in completing the application or if you require a different format of this document, please get in touch with at UKI.recruitment@tcs.com or call TCS London Office number 02031552100/+44 204 520 2575 with the More ❯
Posted:

AppSec Lead

South East, United Kingdom
Halian Technology Limited
A leading fintech company is seeking aLead AppSec Engineerto join their established team. Youll be instrumental in embedding security into every stage of the software development lifecycleguiding engineers, shaping best practices, and driving secure, scalable solutions across our platform. Key More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Senior Software Engineer

United Kingdom
Hybrid/Remote Options
Bezos
At Bezos, our vision is to Deliver Happiness : for our team, for the end consumers, for the e-commerce sellers, and for our logistics partners. Exciting times in e-commerce : E-commerce sales are driven by consumers who increasingly buy More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Senior Agentic AI Engineer

Birmingham, England, United Kingdom
Method Resourcing
act? This is a chance to design and deliver agentic AI systems on Azure that automate real business workflows through tool use, retrieval, and reasoning, with the reliability and observability of true production engineering. In this position you’ll take ownership of designing and scaling end-to-end agentic solutions on Azure, combining LLMs, APIs, and orchestration frameworks to deliver … Productionise on Azure using AI Foundry/OpenAI, Azure ML, Functions, Event Grid/Service Bus, and Kubernetes. Build LLMOps pipelines for evaluation, monitoring, safety, and cost control. Define observability standards across prompts, tools, and data flows. Establish governance patterns, safety, privacy, and auditability. Stay hands-on with critical code paths while guiding architecture and best practice. 🧠Required Skills/ More ❯
Posted:

Azure AI Engineer

United Kingdom
Hybrid/Remote Options
Cognitive Group | Part of the Focus Cloud Group
Functions, Logic Apps, and APIs to orchestrate data and AI workflows. Design and deliver retrieval-augmented generation (RAG) and Copilot-style assistants embedded into business and web applications. Embed observability and monitoring into AI and data pipelines, tracking performance, quality, and cost. Collaborate with data scientists, architects, and product teams to turn prototypes into enterprise-ready AI services . Stay … or equivalent for building and extending web or Power Apps solutions. Knowledge of Azure DevOps, CI/CD, and Infrastructure-as-Code (Bicep, Terraform). Deep appreciation of governance, observability, and secure design principles . More ❯
Posted:

Linux Production Engineer

London Area, United Kingdom
Autonomai Recruitment
experience building technology 0→1 , owning systems end-to-end, and working close to the metal. They will operate across everything from bare-metal Linux to modern build and observability stacks . Linux Platform Engineer – Trading Infrastructure Overview The firm is seeking a Linux Platform Engineer to join a small, high-impact engineering group supporting ML/AI-driven trading. … latency . Contribute to kernel-level debugging and system improvements . Automate Linux fleet builds—creating consistent, reproducible systems . Manage Kubernetes cluster infrastructure, networking, and container orchestration. Enhance observability Analyze and optimize networking across the full TCP/IP stack . Investigate core dumps, memory bottlenecks, and CPU performance issues across distributed systems. Develop Python tooling for internal automation More ❯
Posted:

Linux Production Engineer

City of London, London, United Kingdom
Autonomai Recruitment
experience building technology 0→1 , owning systems end-to-end, and working close to the metal. They will operate across everything from bare-metal Linux to modern build and observability stacks . Linux Platform Engineer – Trading Infrastructure Overview The firm is seeking a Linux Platform Engineer to join a small, high-impact engineering group supporting ML/AI-driven trading. … latency . Contribute to kernel-level debugging and system improvements . Automate Linux fleet builds—creating consistent, reproducible systems . Manage Kubernetes cluster infrastructure, networking, and container orchestration. Enhance observability Analyze and optimize networking across the full TCP/IP stack . Investigate core dumps, memory bottlenecks, and CPU performance issues across distributed systems. Develop Python tooling for internal automation More ❯
Posted:

Head of Platform Engineering

Surrey, England, United Kingdom
Hybrid/Remote Options
La Fosse
scale. This is a pivotal, visible role reporting directly to the CTO. The Opportunity You’ll shape the operational strategy and modernise how the platform is managed, driving reliability, observability, automation, and cost efficiency. You’ll manage the Head of DevOps and work closely with Product, Engineering and Finance to ensure the platform is secure, resilient, scalable and commercially efficient. … IT, Security and Platform Operations Reliability, performance and availability of a cloud-native SaaS platform (AWS/serverless) Cost-to-Serve ownership and cloud cost visibility/optimisation Maturing observability, incident management & operational governance Uplifting DevOps engineering practices and platform automation, use of AI Vendor & outsourced IT partner management Supporting a high-change organisation scaling for enterprise success What You More ❯
Posted:

Director - Performance and Reliability

Cambridgeshire, England, United Kingdom
Sanderson
lead performance testing and chaos engineering initiatives, and embed reliability best practices across engineering, DevOps, and infrastructure teams. This is a senior, strategic leadership role focused on system excellence, observability, and continuous improvement. Ideal Candidate: Proven experience leading Performance Engineering, Reliability, or SRE functions Deep expertise in performance testing methodologies (load, stress, spike, soak) Strong hands-on background with LoadRunner … strategy across critical platforms and services Oversee load, stress, and chaos testing initiatives to ensure systems perform and recover under real-world conditions Define and drive best practices for observability, monitoring, and APM adoption using tools like Dynatrace Drive incident reduction, faster recovery (MTTR) , and continuous reliability improvements Champion a culture of performance ownership , ensuring teams build with scalability, stability More ❯
Employment Type: Full-Time
Salary: £84,000 - £95,000 per annum, Negotiable, Inc benefits
Posted:

Director - Performance and Reliability

Cambridgeshire, East Anglia, United Kingdom
Sanderson Recruitment
lead performance testing and chaos engineering initiatives, and embed reliability best practices across engineering, DevOps, and infrastructure teams. This is a senior, strategic leadership role focused on system excellence, observability, and continuous improvement. Ideal Candidate: Proven experience leading Performance Engineering, Reliability, or SRE functions Deep expertise in performance testing methodologies (load, stress, spike, soak) Strong hands-on background with LoadRunner … strategy across critical platforms and services Oversee load, stress, and chaos testing initiatives to ensure systems perform and recover under real-world conditions Define and drive best practices for observability, monitoring, and APM adoption using tools like Dynatrace Drive incident reduction, faster recovery (MTTR) , and continuous reliability improvements Champion a culture of performance ownership , ensuring teams build with scalability, stability More ❯
Employment Type: Permanent
Salary: £95,000
Posted:

Software Developer

Hammersmith, England, United Kingdom
OpenSource
data sources using advanced web scraping and reverse-engineering techniques. Developing and maintaining low-latency, real-time data feeds to support internal systems and strategies. Improving internal visibility and observability tooling to help diagnose integration issues and identify improvements. Contributing across the full lifecycle of your work — design, development, testing, review, deployment, and ongoing support. Working within an agile, flexible … a rotational basis. Tech Stack Languages: Python (3.10+), plus TypeScript/JavaScript for frontend work, and occasional Go for infrastructure tasks. Messaging: RabbitMQ, Kafka Storage: PostgreSQL, Redis Environment: Linux Observability: OpenTelemetry, Prometheus, Grafana, Zabbix Requirements Must-haves Strong software development experience, especially with Python. A degree in Computer Science or another numerical discipline. Clear communication skills, able to explain technical More ❯
Posted:

Software Developer

Alderley Edge, Cheshire, United Kingdom
Transunion
Day You'll Be: Design and build reliable backend systems and infrastructure tooling Use TDD to write high-quality, maintainable code and build out automated test suites Own reliability, observability, and performance of key services Collaborate with clients to understand requirements, debug issues, and propose solutions Drive improvements to system architecture, automation, and deployment processes Mentor junior developers and contribute … in writing and on calls Desirable Skills & Experience: Experience owning backend systems in production environments Experience with Cloud Platforms AWS or GCP Infrastructure-as-code, CI/CD, and observability tooling Experience scaling systems under sustained load Contributions to internal tooling or open source Experience with large datasets and machine learning models Impact You'll Make: What's In It More ❯
Employment Type: Permanent
Posted:

Senior DevSecOps engineer

England, United Kingdom
Hybrid/Remote Options
Seccl Technology Limited
handling, JWK publishing, and SSO connection setup. Utilising Infrastructure as Code (Terraform) and CI/CD (GitHub Actions) to manage Auth0 configuration and ensure safe, repeatable deployments. Implementing comprehensive observability for authentication paths with structured logs, monitoring dashboards, alerts, and SLOs. Collaborating closely with product, engineering, and support teams on migration timelines, communications, and incident response. This role's for … and identity configurations, including secure secrets management. Solid understanding of core AWS services relevant to modern authentication patterns, such as API Gateway, Lambda authorisers, and CloudWatch. A commitment to observability, with hands-on experience implementing structured logging, dashboards, and SLOs for critical services. Excellent collaboration skills, demonstrated through participation in design reviews, pairing, and writing clear technical documentation (e.g., runbooks More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Senior DevSecOps engineer

Edinburgh, Midlothian, United Kingdom
Hybrid/Remote Options
Seccl Technology Limited
handling, JWK publishing, and SSO connection setup. Utilising Infrastructure as Code (Terraform) and CI/CD (GitHub Actions) to manage Auth0 configuration and ensure safe, repeatable deployments. Implementing comprehensive observability for authentication paths with structured logs, monitoring dashboards, alerts, and SLOs. Collaborating closely with product, engineering, and support teams on migration timelines, communications, and incident response. This role's for … and identity configurations, including secure secrets management. Solid understanding of core AWS services relevant to modern authentication patterns, such as API Gateway, Lambda authorisers, and CloudWatch. A commitment to observability, with hands-on experience implementing structured logging, dashboards, and SLOs for critical services. Excellent collaboration skills, demonstrated through participation in design reviews, pairing, and writing clear technical documentation (e.g., runbooks More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Senior DevSecOps engineer

Bath, Somerset, United Kingdom
Hybrid/Remote Options
Seccl Technology Limited
handling, JWK publishing, and SSO connection setup. Utilising Infrastructure as Code (Terraform) and CI/CD (GitHub Actions) to manage Auth0 configuration and ensure safe, repeatable deployments. Implementing comprehensive observability for authentication paths with structured logs, monitoring dashboards, alerts, and SLOs. Collaborating closely with product, engineering, and support teams on migration timelines, communications, and incident response. This role's for … and identity configurations, including secure secrets management. Solid understanding of core AWS services relevant to modern authentication patterns, such as API Gateway, Lambda authorisers, and CloudWatch. A commitment to observability, with hands-on experience implementing structured logging, dashboards, and SLOs for critical services. Excellent collaboration skills, demonstrated through participation in design reviews, pairing, and writing clear technical documentation (e.g., runbooks More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Azure Consultant - Presales

United Kingdom
Hybrid/Remote Options
Hancock & Parsons Ltd
A well established software company are seeking an Azure Consultant to join their team! This is a hybrid technical and pre-sales role that involves a mix of customer engagement and being hands-on with Azure. If you're someone More ❯
Posted:

Head of Cloud – Contract (Outside IR35)

Derby, England, United Kingdom
Hybrid/Remote Options
Experis UK
Head of Cloud – Contract (Outside IR35) Location: Hybrid (East Midlands/London 1-2 days/week onsite) Rate: Up to £700/day Contract Type: Outside IR35 Duration: 3-6 months (initial), with potential extension Start Date: ASAP About More ❯
Posted:

Staff Data Engineer

London, United Kingdom
Hybrid/Remote Options
Fruition Group
Job Title: Staff Data Engineer Location: London, Hybrid Salary: c.£140,000 + bonus + share options Why Apply? This is a unique opportunity to take a leading role in shaping the data strategy of a fast growing Insurtech scale More ❯
Employment Type: Permanent
Posted:

Staff Site Reliability Engineer - Observability

London Area, United Kingdom
Hybrid/Remote Options
Motive Group
Senior/Staff Site Reliability Engineer - Observability | London (Hybrid) If you care deeply about building and operating world-class infrastructure for AI at scale , this one’s worth your time. We’re working with a company that builds the backbone powering some of the most demanding AI workloads on the planet. Think large-scale GPU clusters, global telemetry systems, and … distributed training environments used by leading research and enterprise teams. They’re looking for a Senior or Staff SRE with deep experience in observability at massive scale - someone who’s tuned Prometheus/Mimir, Loki, or Tempo clusters beyond 100M+ series or 10TB/day logs, and who thrives in highly technical, fast-moving environments. You’ll be working on … Designing and scaling observability for globally distributed GPU infrastructure Building automation that cuts operational toil and improves reliability Partnering with platform and infrastructure teams to deliver true visibility across complex AI systems If you’ve built or operated telemetry stacks for large-scale, GPU-heavy, or multi-tenant environments - and want to work on cutting-edge problems in a business More ❯
Posted:

Staff Site Reliability Engineer - Observability

City of London, London, United Kingdom
Hybrid/Remote Options
Motive Group
Senior/Staff Site Reliability Engineer - Observability | London (Hybrid) If you care deeply about building and operating world-class infrastructure for AI at scale , this one’s worth your time. We’re working with a company that builds the backbone powering some of the most demanding AI workloads on the planet. Think large-scale GPU clusters, global telemetry systems, and … distributed training environments used by leading research and enterprise teams. They’re looking for a Senior or Staff SRE with deep experience in observability at massive scale - someone who’s tuned Prometheus/Mimir, Loki, or Tempo clusters beyond 100M+ series or 10TB/day logs, and who thrives in highly technical, fast-moving environments. You’ll be working on … Designing and scaling observability for globally distributed GPU infrastructure Building automation that cuts operational toil and improves reliability Partnering with platform and infrastructure teams to deliver true visibility across complex AI systems If you’ve built or operated telemetry stacks for large-scale, GPU-heavy, or multi-tenant environments - and want to work on cutting-edge problems in a business More ❯
Posted:

Platform Engineer

England, United Kingdom
Hybrid/Remote Options
Harnham
AND RESPONSIBILITIES Support and enhance the company's infrastructure and production systems across GCP. Contribute to a major replatforming project to GKE, ensuring scalability, automation, and security. Improve reliability, observability, and CI/CD pipelines (GitLab CI, Argo, Flux). Work closely with developers to embed best practices and elevate the internal developer experience (IDP). Collaborate within a small … with GCP (Google Cloud Platform) . Deep understanding of CI/CD pipelines - GitLab CI, Argo, or Flux. Experience with HashiCorp Vault and open-source tooling. Background in automation, observability, and platform reliability . Excellent problem-solving skills and a collaborative, pragmatic mindset. THE DETAILS Day rate: £550-£650/day (Outside IR35) Contract: 3 months, with scope for extension … AND RESPONSIBILITIES Support and enhance the company's infrastructure and production systems across GCP. Contribute to a major replatforming project to GKE, ensuring scalability, automation, and security. Improve reliability, observability, and CI/CD pipelines (GitLab CI, Argo, Flux). Work closely with developers to embed best practices and elevate the internal developer experience (IDP). Collaborate within a small More ❯
Posted:

Site Reliability Engineer - AWS - Grafana - Cloudwatch - ELK - UK Remote

United Kingdom
Hybrid/Remote Options
Opus Recruitment Solutions
AWS | GCP | SRE | Site Reliability Engineer | Terraform | Cloudformation | ECS | ELK | Elasticsearch | Logstash | Kabana | Cloudwatch | Grafana | Windows | Observability Are you looking for a genuinely Remote opportunity? Somewhere you're part of something bigger, working on a global product within a close-knit SRE team? I've partnered a WebApp that provide an end to end event management for some of the … planet's biggest artists and they're now looking for a SRE. Someone that knows their way around classic Observability with Grafana, ELK stack, and cost optomisation for the product as they continue scaling. Working across the glove their multi-tenanted, AWS environments requires someone who is able to reverse engineer product faults, or post incident audits to ensure future … like to hear more, send over a CV to robin.shaw@opusrs.com or apply! AWS | GCP | SRE | Site Reliability Engineer | Terraform | Cloudformation | ECS | ELK | Elasticsearch | Logstash | Kabana | Cloudwatch | Grafana | Windows | Observability More ❯
Posted:

Site Reliability Engineer - AWS - Grafana - Cloudwatch - ELK - UK Remote

East London, London, United Kingdom
Hybrid/Remote Options
Opus Recruitment Solutions
AWS | GCP | SRE | Site Reliability Engineer | Terraform | Cloudformation | ECS | ELK | Elasticsearch | Logstash | Kabana | Cloudwatch | Grafana | Windows | Observability Are you looking for a genuinely Remote opportunity? Somewhere you're part of something bigger, working on a global product within a close-knit SRE team? I've partnered a WebApp that provide an end to end event management for some of the … planet's biggest artists and they're now looking for a SRE. Someone that knows their way around classic Observability with Grafana, ELK stack, and cost optomisation for the product as they continue scaling. Working across the glove their multi-tenanted, AWS environments requires someone who is able to reverse engineer product faults, or post incident audits to ensure future … like to hear more, send over a CV to robin.shaw@opusrs.com or apply! AWS | GCP | SRE | Site Reliability Engineer | Terraform | Cloudformation | ECS | ELK | Elasticsearch | Logstash | Kabana | Cloudwatch | Grafana | Windows | Observability More ❯
Posted:

Site Reliability Engineer - AWS - Grafana - Cloudwatch - ELK - UK Remote

City of London, London, United Kingdom
Hybrid/Remote Options
Opus Recruitment Solutions
AWS | GCP | SRE | Site Reliability Engineer | Terraform | Cloudformation | ECS | ELK | Elasticsearch | Logstash | Kabana | Cloudwatch | Grafana | Windows | Observability Are you looking for a genuinely Remote opportunity? Somewhere you're part of something bigger, working on a global product within a close-knit SRE team? I've partnered a WebApp that provide an end to end event management for some of the … planet's biggest artists and they're now looking for a SRE. Someone that knows their way around classic Observability with Grafana, ELK stack, and cost optomisation for the product as they continue scaling. Working across the glove their multi-tenanted, AWS environments requires someone who is able to reverse engineer product faults, or post incident audits to ensure future … like to hear more, send over a CV to robin.shaw@opusrs.com or apply! AWS | GCP | SRE | Site Reliability Engineer | Terraform | Cloudformation | ECS | ELK | Elasticsearch | Logstash | Kabana | Cloudwatch | Grafana | Windows | Observability More ❯
Posted:
Observability
10th Percentile
£56,593
25th Percentile
£67,500
Median
£80,000
75th Percentile
£105,000
90th Percentile
£140,250