Observability Jobs in the UK

776 to 800 of 870 Observability Jobs in the UK

Site Reliability Engineer - AWS - Grafana - Cloudwatch - ELK - UK Remote

City of London, London, United Kingdom
Hybrid/Remote Options
Opus Recruitment Solutions
AWS | GCP | SRE | Site Reliability Engineer | Terraform | Cloudformation | ECS | ELK | Elasticsearch | Logstash | Kabana | Cloudwatch | Grafana | Windows | Observability Are you looking for a genuinely Remote opportunity? Somewhere you're part of something bigger, working on a global product within a close-knit SRE team? I've partnered a WebApp that provide an end to end event management for some of the … planet's biggest artists and they're now looking for a SRE. Someone that knows their way around classic Observability with Grafana, ELK stack, and cost optomisation for the product as they continue scaling. Working across the glove their multi-tenanted, AWS environments requires someone who is able to reverse engineer product faults, or post incident audits to ensure future … like to hear more, send over a CV to robin.shaw@opusrs.com or apply! AWS | GCP | SRE | Site Reliability Engineer | Terraform | Cloudformation | ECS | ELK | Elasticsearch | Logstash | Kabana | Cloudwatch | Grafana | Windows | Observability More ❯
Posted:

Site Reliability Engineer - AWS - Grafana - Cloudwatch - ELK - UK Remote

East London, London, United Kingdom
Hybrid/Remote Options
Opus Recruitment Solutions
AWS | GCP | SRE | Site Reliability Engineer | Terraform | Cloudformation | ECS | ELK | Elasticsearch | Logstash | Kabana | Cloudwatch | Grafana | Windows | Observability Are you looking for a genuinely Remote opportunity? Somewhere you're part of something bigger, working on a global product within a close-knit SRE team? I've partnered a WebApp that provide an end to end event management for some of the … planet's biggest artists and they're now looking for a SRE. Someone that knows their way around classic Observability with Grafana, ELK stack, and cost optomisation for the product as they continue scaling. Working across the glove their multi-tenanted, AWS environments requires someone who is able to reverse engineer product faults, or post incident audits to ensure future … like to hear more, send over a CV to robin.shaw@opusrs.com or apply! AWS | GCP | SRE | Site Reliability Engineer | Terraform | Cloudformation | ECS | ELK | Elasticsearch | Logstash | Kabana | Cloudwatch | Grafana | Windows | Observability More ❯
Posted:

Site Reliability Engineer - AWS - Grafana - Cloudwatch - ELK - UK Remote

Leeds, West Yorkshire, United Kingdom
Hybrid/Remote Options
Opus Recruitment Solutions
AWS | GCP | SRE | Site Reliability Engineer | Terraform | Cloudformation | ECS | ELK | Elasticsearch | Logstash | Kabana | Cloudwatch | Grafana | Windows | Observability Are you looking for a genuinely Remote opportunity? Somewhere you're part of something bigger, working on a global product within a close-knit SRE team? I've partnered a WebApp that provide an end to end event management for some of the … planet's biggest artists and they're now looking for a SRE. Someone that knows their way around classic Observability with Grafana, ELK stack, and cost optomisation for the product as they continue scaling. Working across the glove their multi-tenanted, AWS environments requires someone who is able to reverse engineer product faults, or post incident audits to ensure future … like to hear more, send over a CV to robin.shaw@opusrs.com or apply! AWS | GCP | SRE | Site Reliability Engineer | Terraform | Cloudformation | ECS | ELK | Elasticsearch | Logstash | Kabana | Cloudwatch | Grafana | Windows | Observability More ❯
Posted:

Site Reliability Engineer - AWS - Grafana - Cloudwatch - ELK - UK Remote

Leigh, Greater Manchester, United Kingdom
Hybrid/Remote Options
Opus Recruitment Solutions
AWS | GCP | SRE | Site Reliability Engineer | Terraform | Cloudformation | ECS | ELK | Elasticsearch | Logstash | Kabana | Cloudwatch | Grafana | Windows | Observability Are you looking for a genuinely Remote opportunity? Somewhere you're part of something bigger, working on a global product within a close-knit SRE team? I've partnered a WebApp that provide an end to end event management for some of the … planet's biggest artists and they're now looking for a SRE. Someone that knows their way around classic Observability with Grafana, ELK stack, and cost optomisation for the product as they continue scaling. Working across the glove their multi-tenanted, AWS environments requires someone who is able to reverse engineer product faults, or post incident audits to ensure future … like to hear more, send over a CV to robin.shaw@opusrs.com or apply! AWS | GCP | SRE | Site Reliability Engineer | Terraform | Cloudformation | ECS | ELK | Elasticsearch | Logstash | Kabana | Cloudwatch | Grafana | Windows | Observability More ❯
Posted:

Site Reliability Engineer - AWS - Grafana - Cloudwatch - ELK - UK Remote

Bolton, Greater Manchester, United Kingdom
Hybrid/Remote Options
Opus Recruitment Solutions
AWS | GCP | SRE | Site Reliability Engineer | Terraform | Cloudformation | ECS | ELK | Elasticsearch | Logstash | Kabana | Cloudwatch | Grafana | Windows | Observability Are you looking for a genuinely Remote opportunity? Somewhere you're part of something bigger, working on a global product within a close-knit SRE team? I've partnered a WebApp that provide an end to end event management for some of the … planet's biggest artists and they're now looking for a SRE. Someone that knows their way around classic Observability with Grafana, ELK stack, and cost optomisation for the product as they continue scaling. Working across the glove their multi-tenanted, AWS environments requires someone who is able to reverse engineer product faults, or post incident audits to ensure future … like to hear more, send over a CV to robin.shaw@opusrs.com or apply! AWS | GCP | SRE | Site Reliability Engineer | Terraform | Cloudformation | ECS | ELK | Elasticsearch | Logstash | Kabana | Cloudwatch | Grafana | Windows | Observability More ❯
Posted:

Site Reliability Engineer - AWS - Grafana - Cloudwatch - ELK - UK Remote

Bury, Greater Manchester, United Kingdom
Hybrid/Remote Options
Opus Recruitment Solutions
AWS | GCP | SRE | Site Reliability Engineer | Terraform | Cloudformation | ECS | ELK | Elasticsearch | Logstash | Kabana | Cloudwatch | Grafana | Windows | Observability Are you looking for a genuinely Remote opportunity? Somewhere you're part of something bigger, working on a global product within a close-knit SRE team? I've partnered a WebApp that provide an end to end event management for some of the … planet's biggest artists and they're now looking for a SRE. Someone that knows their way around classic Observability with Grafana, ELK stack, and cost optomisation for the product as they continue scaling. Working across the glove their multi-tenanted, AWS environments requires someone who is able to reverse engineer product faults, or post incident audits to ensure future … like to hear more, send over a CV to robin.shaw@opusrs.com or apply! AWS | GCP | SRE | Site Reliability Engineer | Terraform | Cloudformation | ECS | ELK | Elasticsearch | Logstash | Kabana | Cloudwatch | Grafana | Windows | Observability More ❯
Posted:

Site Reliability Engineer - AWS - Grafana - Cloudwatch - ELK - UK Remote

Altrincham, Greater Manchester, United Kingdom
Hybrid/Remote Options
Opus Recruitment Solutions
AWS | GCP | SRE | Site Reliability Engineer | Terraform | Cloudformation | ECS | ELK | Elasticsearch | Logstash | Kabana | Cloudwatch | Grafana | Windows | Observability Are you looking for a genuinely Remote opportunity? Somewhere you're part of something bigger, working on a global product within a close-knit SRE team? I've partnered a WebApp that provide an end to end event management for some of the … planet's biggest artists and they're now looking for a SRE. Someone that knows their way around classic Observability with Grafana, ELK stack, and cost optomisation for the product as they continue scaling. Working across the glove their multi-tenanted, AWS environments requires someone who is able to reverse engineer product faults, or post incident audits to ensure future … like to hear more, send over a CV to robin.shaw@opusrs.com or apply! AWS | GCP | SRE | Site Reliability Engineer | Terraform | Cloudformation | ECS | ELK | Elasticsearch | Logstash | Kabana | Cloudwatch | Grafana | Windows | Observability More ❯
Posted:

Site Reliability Engineer - AWS - Grafana - Cloudwatch - ELK - UK Remote

Central London / West End, London, United Kingdom
Hybrid/Remote Options
Opus Recruitment Solutions
AWS | GCP | SRE | Site Reliability Engineer | Terraform | Cloudformation | ECS | ELK | Elasticsearch | Logstash | Kabana | Cloudwatch | Grafana | Windows | Observability Are you looking for a genuinely Remote opportunity? Somewhere you're part of something bigger, working on a global product within a close-knit SRE team? I've partnered a WebApp that provide an end to end event management for some of the … planet's biggest artists and they're now looking for a SRE. Someone that knows their way around classic Observability with Grafana, ELK stack, and cost optomisation for the product as they continue scaling. Working across the glove their multi-tenanted, AWS environments requires someone who is able to reverse engineer product faults, or post incident audits to ensure future … like to hear more, send over a CV to robin.shaw@opusrs.com or apply! AWS | GCP | SRE | Site Reliability Engineer | Terraform | Cloudformation | ECS | ELK | Elasticsearch | Logstash | Kabana | Cloudwatch | Grafana | Windows | Observability More ❯
Posted:

Site Reliability Engineer - AWS - Grafana - Cloudwatch - ELK - UK Remote

Ashton-Under-Lyne, Greater Manchester, United Kingdom
Hybrid/Remote Options
Opus Recruitment Solutions
AWS | GCP | SRE | Site Reliability Engineer | Terraform | Cloudformation | ECS | ELK | Elasticsearch | Logstash | Kabana | Cloudwatch | Grafana | Windows | Observability Are you looking for a genuinely Remote opportunity? Somewhere you're part of something bigger, working on a global product within a close-knit SRE team? I've partnered a WebApp that provide an end to end event management for some of the … planet's biggest artists and they're now looking for a SRE. Someone that knows their way around classic Observability with Grafana, ELK stack, and cost optomisation for the product as they continue scaling. Working across the glove their multi-tenanted, AWS environments requires someone who is able to reverse engineer product faults, or post incident audits to ensure future … like to hear more, send over a CV to robin.shaw@opusrs.com or apply! AWS | GCP | SRE | Site Reliability Engineer | Terraform | Cloudformation | ECS | ELK | Elasticsearch | Logstash | Kabana | Cloudwatch | Grafana | Windows | Observability More ❯
Posted:

Senior Data Engineer

London Area, United Kingdom
Hybrid/Remote Options
Identify Solutions
the past year and aggressive expansion across the UK, US, and EU, the company is scaling at pace. Data is the backbone: from APIs and pipelines to governance and observability, their data platform directly powers customer-facing products and AI-driven insights. They’re now hiring a Senior Data Engineer to own and shape this platform, building scalable, production-grade … systems that become the foundation for global brands. Why join? ✨ Greenfield impact – inherit a live but early platform, define best practice across structure, testing, observability, and governance. ✨ Direct product impact – your APIs, pipelines, and orchestration power the platform that 1,000+ brands rely on every day. ✨ AI at the core – work on infrastructure that enables machine learning and intelligent decision … doing: API strategy & development – own and scale FastAPI endpoints that deliver real-time access to platform data. Data pipeline development – build ingestion and replication pipelines with best-in-class observability, latency, and resilience. Platform technical vision – influence architecture and orchestration, shaping how the business handles data at scale. Data quality & governance – embed testing, freshness, lineage, and monitoring to ensure reliability More ❯
Posted:

Senior Data Engineer

City of London, London, United Kingdom
Hybrid/Remote Options
Identify Solutions
the past year and aggressive expansion across the UK, US, and EU, the company is scaling at pace. Data is the backbone: from APIs and pipelines to governance and observability, their data platform directly powers customer-facing products and AI-driven insights. They’re now hiring a Senior Data Engineer to own and shape this platform, building scalable, production-grade … systems that become the foundation for global brands. Why join? ✨ Greenfield impact – inherit a live but early platform, define best practice across structure, testing, observability, and governance. ✨ Direct product impact – your APIs, pipelines, and orchestration power the platform that 1,000+ brands rely on every day. ✨ AI at the core – work on infrastructure that enables machine learning and intelligent decision … doing: API strategy & development – own and scale FastAPI endpoints that deliver real-time access to platform data. Data pipeline development – build ingestion and replication pipelines with best-in-class observability, latency, and resilience. Platform technical vision – influence architecture and orchestration, shaping how the business handles data at scale. Data quality & governance – embed testing, freshness, lineage, and monitoring to ensure reliability More ❯
Posted:

Network Operations Engineer

City of London, London, United Kingdom
Hybrid/Remote Options
Stanford Black Limited
low latency network environment . You’ll be joining a collaborative and forward-thinking environment with flat structures and deep technical expertise – ideal for someone who enjoys network automation, observability tooling, and IaC . Below I have included a breakdown of the role, company, and requirements. Please review and if the opportunity seems like a good fit share your CV … opportunities for network automation and implement appropriately IaC heavy environment - work with likes of Ansible, Python, CI/CD, GitOps practices Deliver troubleshooting, operational enhancements, and BAU changes Develop observability tooling (dashboards, alerts) and build self-healing or event-driven automation Lead post-incident reviews and trend analysis to continuously improve network reliability and performance Company: Technology-led culture – Drives … investment firm Vendors – Arista, Cisco, Corvil, Nvidia (All not necessary, but the more the better) Python/Golang for Network Automation Proven experience with low latency networking Monitoring and observability tooling such as Nagios, Solarwinds, Prometheus, Alertmanager, Grafana Analysis tooling i.e. Wireshark, Splunk, PromQL Familiarity with IaC/DevOps tools such as Ansible, GitOps, CI/CD Exposure to vendor More ❯
Posted:

Network Operations Engineer

London Area, United Kingdom
Hybrid/Remote Options
Stanford Black Limited
low latency network environment . You’ll be joining a collaborative and forward-thinking environment with flat structures and deep technical expertise – ideal for someone who enjoys network automation, observability tooling, and IaC . Below I have included a breakdown of the role, company, and requirements. Please review and if the opportunity seems like a good fit share your CV … opportunities for network automation and implement appropriately IaC heavy environment - work with likes of Ansible, Python, CI/CD, GitOps practices Deliver troubleshooting, operational enhancements, and BAU changes Develop observability tooling (dashboards, alerts) and build self-healing or event-driven automation Lead post-incident reviews and trend analysis to continuously improve network reliability and performance Company: Technology-led culture – Drives … investment firm Vendors – Arista, Cisco, Corvil, Nvidia (All not necessary, but the more the better) Python/Golang for Network Automation Proven experience with low latency networking Monitoring and observability tooling such as Nagios, Solarwinds, Prometheus, Alertmanager, Grafana Analysis tooling i.e. Wireshark, Splunk, PromQL Familiarity with IaC/DevOps tools such as Ansible, GitOps, CI/CD Exposure to vendor More ❯
Posted:

DevOps Engineer

City of London, London, United Kingdom
Tribus
frameworks that support thousands of real-time processes across global markets. This isn’t a maintenance role - it’s an opportunity to modernise the firm’s CI/CD, observability, and runtime environments from the ground up. What you’ll be doing: Engineering and optimising CI/CD pipelines and container orchestration at scale Modernising Linux-based deployment and runtime … low-latency environment What we’re looking for: 5+ years’ experience in DevOps, Systems, or Platform Engineering Deep knowledge of Linux, Python, and shell scripting Proven experience with Kubernetes, observability tooling, and CI/CD frameworks Strong grasp of distributed systems and performance tuning Trading, Crypto or Hedge Fund Experience Experience working in low-latency, high-frequency trading environments Why More ❯
Posted:

DevOps Engineer

London Area, United Kingdom
Tribus
frameworks that support thousands of real-time processes across global markets. This isn’t a maintenance role - it’s an opportunity to modernise the firm’s CI/CD, observability, and runtime environments from the ground up. What you’ll be doing: Engineering and optimising CI/CD pipelines and container orchestration at scale Modernising Linux-based deployment and runtime … low-latency environment What we’re looking for: 5+ years’ experience in DevOps, Systems, or Platform Engineering Deep knowledge of Linux, Python, and shell scripting Proven experience with Kubernetes, observability tooling, and CI/CD frameworks Strong grasp of distributed systems and performance tuning Trading, Crypto or Hedge Fund Experience Experience working in low-latency, high-frequency trading environments Why More ❯
Posted:

Lead AI Engineer

England, United Kingdom
Nicoll Curtin
integrate AI models for predictive insights and intelligent recommendations. Connect AI agents with core enterprise systems and data platforms. Define technical standards, reusable patterns, and governance principles. Ensure reliability, observability, and performance across AI solutions. Mentor engineering teams and foster best practices in design, DevOps, and model lifecycle management. Collaborate with data scientists, product managers, and architects to align solutions … patterns and scalable system design. Excellent communication and leadership skills, with the ability to influence technical direction. Desirable: Experience in computer vision, IoT, or multi-agent systems. Knowledge of observability and monitoring frameworks for AI operations. Apply now for immediate consideration. No sponsorship available; applicants must have the right to work in the UK. More ❯
Posted:

Staff Software Engineer

London Area, United Kingdom
Burns Sheehan
to solve complex challenges. Drive innovation around cloud-native technologies and platform automation. Balance strategic vision with ~30% hands-on coding and design work. Promote best practice in reliability, observability, and scalability. The Ideal Staff Software Engineer Proven experience operating at Staff+ level within a fast-paced engineering organisation. Strong background in cloud platforms (AWS or GCP) and deep knowledge … ability to build operators. Strong coding skills in Golang, Java, or C#, with experience in distributed systems. Demonstrated leadership across multiple squads and technical roadmaps. Expertise in operational excellence: observability, reliability, automation. This is an outstanding opportunity for a Staff Software Engineer join a rapidly scaling company where you’ll play a pivotal role in shaping the technical foundations of More ❯
Posted:

Staff Software Engineer

City of London, London, United Kingdom
Burns Sheehan
to solve complex challenges. Drive innovation around cloud-native technologies and platform automation. Balance strategic vision with ~30% hands-on coding and design work. Promote best practice in reliability, observability, and scalability. The Ideal Staff Software Engineer Proven experience operating at Staff+ level within a fast-paced engineering organisation. Strong background in cloud platforms (AWS or GCP) and deep knowledge … ability to build operators. Strong coding skills in Golang, Java, or C#, with experience in distributed systems. Demonstrated leadership across multiple squads and technical roadmaps. Expertise in operational excellence: observability, reliability, automation. This is an outstanding opportunity for a Staff Software Engineer join a rapidly scaling company where you’ll play a pivotal role in shaping the technical foundations of More ❯
Posted:

Software Engineer

london, south east england, united kingdom
5fd48781-b8d5-43d9-9415-5b826a21d1d3
agent context & decision making. Develop backend infrastructure and intelligent automation for knowledge crawling, extraction and enrichment. Help shape and contribute towards our DevOps practices (CI/CD, cloud infrastructure, observability). Stay on the frontier of AI, keeping up to date with emerging tools and technologies to keep us at the edge of what's possible. Requirements Deep backend expertise … with Python or Javascript. Strong knowledge of DevOps practices, including CI/CD, cloud infrastructure & observability/monitoring. Previous full ownership of a successful high volume system. Naturally articulate and able to communicate complex concepts clearly. Evidence of excellence at everything you do. Organic curiosity and obsession for AI and cutting edge technology. Exceptional problem-solving skills and meticulous attention More ❯
Posted:

Principal Platform Engineer | Fintech | London | Up to £180k + Equity

City of London, London, United Kingdom
Maze
at Tier 1 banks. We're looking for a Principal Platform Engineer to drive the infrastructure behind mission-critical systems: think active-active, five-nines uptime, and real-time observability at global scale. What You'll Do: Own platform architecture for our next-gen ledger infrastructure Scale multi-region Kubernetes environments across cloud & on-prem Harden distributed systems (Kafka, Redis … CockroachDB) for global banking workloads Lead our AI-powered SRE approach: observability, remediation, and auto-response Enforce zero-trust, multi-tenant security and compliance (SOC2, ISO 27001) Define IaC foundations (Terraform, GitOps, Helm) What We're Looking For: Expert with Kubernetes and Distributed Systems Experience building production infrastructure at scale (multi-region, high-availability) Extensive experience building both on-Prem More ❯
Posted:

Principal Platform Engineer | Fintech | London | Up to £180k + Equity

London Area, United Kingdom
Maze
at Tier 1 banks. We're looking for a Principal Platform Engineer to drive the infrastructure behind mission-critical systems: think active-active, five-nines uptime, and real-time observability at global scale. What You'll Do: Own platform architecture for our next-gen ledger infrastructure Scale multi-region Kubernetes environments across cloud & on-prem Harden distributed systems (Kafka, Redis … CockroachDB) for global banking workloads Lead our AI-powered SRE approach: observability, remediation, and auto-response Enforce zero-trust, multi-tenant security and compliance (SOC2, ISO 27001) Define IaC foundations (Terraform, GitOps, Helm) What We're Looking For: Expert with Kubernetes and Distributed Systems Experience building production infrastructure at scale (multi-region, high-availability) Extensive experience building both on-Prem More ❯
Posted:

Data Engineer

City of London, London, United Kingdom
83zero Limited
to translate complex business requirements into data-driven solutions. Write production-grade SQL and ensure data quality through testing, documentation, and version control. Promote best practices around data reliability, observability, and maintainability. (Optional but valued) Contribute to Infrastructure as Code and CI/CD pipelines (e.g., Terraform, GitHub Actions). Skills & Experience 5+ years of experience in data-focused roles … other data visualisation tools. Familiarity with orchestration tools such as Airflow, Prefect, or Dagster. Understanding of CI/CD practices in data and analytics engineering. Knowledge of data governance, observability, and security best practices in cloud environments. More ❯
Employment Type: Permanent
Posted:

(Senior) EMEA AI Product Owner - Hemel Hempstead

Hemel Hempstead, Hertfordshire, United Kingdom
Boston Scientific Gruppe
by design. You'll groom the roadmap and write user stories to add to the backlog, break down epics, refine acceptance criteria, and balance new features with technical debt, observability, and cost efficiency. You'll demo recent increments, gather user feedback, and turn it into testable stories that enhance usability, trust, and performance. You'll brief sponsors on the value … solutions for the EMEA region. Key Responsibilities: Backlog ownership & delivery: Convert outcomes into prioritized epics/stories, lead refinement and planning, uphold DoR/DoD, balance features with reliability, observability, cost, and sustainability. Value & metrics: Establish VIPs/KPIs/OKRs (adoption, time to data, data trust, NPS, ROI); run quarterly value reviews, iterate the roadmap based on evidence. Privacy More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Agentic Developer - Building guardrails for autonomous AI

London Area, United Kingdom
governr
production systems at scale • Expert-level proficiency in Python, Rust, or Go (you write systems that can't fail) • Deep understanding of distributed systems, real-time data processing, and observability architectures • Production ML/AI experience : You've deployed models, debugged their failures, and built monitoring around them • System design mastery : You can architect for reliability, scalability, and auditability simultaneously … Knowledge: • Understanding of agent architectures : autonomous decision-making, goal-directed behaviour, tool use, memory systems • Familiarity with AI safety concepts : alignment, interpretability, robustness, adversarial examples • Experience with monitoring/observability : instrumentation, logging, tracing, alerting in complex systems Working Style: • You ship to production regularly and own what you deploy • You write documentation that others can actually use • You thrive in More ❯
Posted:

Agentic Developer - Building guardrails for autonomous AI

City of London, London, United Kingdom
governr
production systems at scale • Expert-level proficiency in Python, Rust, or Go (you write systems that can't fail) • Deep understanding of distributed systems, real-time data processing, and observability architectures • Production ML/AI experience : You've deployed models, debugged their failures, and built monitoring around them • System design mastery : You can architect for reliability, scalability, and auditability simultaneously … Knowledge: • Understanding of agent architectures : autonomous decision-making, goal-directed behaviour, tool use, memory systems • Familiarity with AI safety concepts : alignment, interpretability, robustness, adversarial examples • Experience with monitoring/observability : instrumentation, logging, tracing, alerting in complex systems Working Style: • You ship to production regularly and own what you deploy • You write documentation that others can actually use • You thrive in More ❯
Posted:
Observability
10th Percentile
£56,593
25th Percentile
£67,500
Median
£80,000
75th Percentile
£105,000
90th Percentile
£140,250