476 to 500 of 563 Observability Jobs in the UK

AWS SRE Engineer

Hiring Organisation
TekWissen ®
Location
Glasgow City, Scotland, United Kingdom
unmatched experience and specialized skills across more than 40 industries Role Overview We are seeking an experienced AWS SRE Engineer with strong expertise in Observability, particularly in developing and maintaining Grafana dashboards for monitoring, alerting, and operational insights. The ideal candidate will have hands-on experience in AWS environments … Databricks is preferred. Key Responsibilities: Design, develop, and maintain Grafana dashboards to provide actionable insights into platform health, performance, and reliability. Build and enhance observability solutions for AWS-hosted applications and infrastructure. Define and track SLIs, SLOs, and SLAs to measure service reliability and performance. Monitor system health using golden ...

Observability Engineer - Trading - £100,000-£200,000 + Bonus

Hiring Organisation
Hunter Bond
Location
London Area, United Kingdom
title: Observability Engineer – Trading Client: Elite FinTech Salary: £100,000-£200,000 + Bonus Location: London Skills: Prometheus, Grafana, VictoriaMetrics, Vector, ELK, AlertManager The role: My client are seeking an Engineer with strong Linux experience, and expertise within the Observability space. The organisation uses a number of different tools effectively … estate. The successful candidate will have skills around a minimum of 2 of the following technologies: VictoriaMetrics Prometheus Grafana Vector ELK AlertManager Alongside the Observability skills, you will also have skills around the following: Linux Git Python An understanding of Kubernetes is a distinct advantage. Please apply ASAP for more ...

Senior Platform Engineer

Hiring Organisation
Realm
Location
United Kingdom
building and owning the production infrastructure for a multi-user distributed system from the ground up. That means designing for debuggability and observability from day one, not bolting it on later. Core remit includes scalable multi-environment Terraform, secrets management, gradual deployment practices (blue/green), and the ability … testing. An AI/multi-agent infrastructure component is on the near-term roadmap. The stack IaC Terraform + Terragrunt Helm/Kubernetes AWS Observability Prometheus/Grafana Auth0 Rust/Golang NoSQL What they're looking for Production experience with Terraform, Helm/Kubernetes, AWS networking, and debugging multi ...

Data Reliability Engineer

Hiring Organisation
Ashdown Group
Location
City of London, London, United Kingdom
Employment Type
Permanent, Work From Home
work from home 2 days per week. This is a high-impact role focused on improving data quality, reducing incidents, and building scalable observability across a modern enterprise data platform. Youll help ensure data across the organisation is accurate, reliable, and trusted for critical business decision-making. Youll take ownership … style roles, with strong SQL and Python skills and experience working in modern cloud-based data environments. Hands-on experience with data observability tools such as Grafana, Monte Carlo, or Acceldata, and data governance/quality platforms like Informatica, Collibra or Microsoft Purview is highly desirable. Experience within the Azure ...

Security Platform Engineer

Hiring Organisation
Addition
Location
Hampshire, England, United Kingdom
engineering teams to promote secure-by-design practices Maintaining clear documentation across systems, configurations, and processes Supporting the continuous improvement of platform security and observability Main Skills Needed: Background in Security Engineering, Platform Engineering, or similar Strong hands-on experience with Kubernetes and container environments Proven experience with tools such … Splunk and Nessus Knowledge of SIEM, observability, and vulnerability management practices Scripting or automation capability (Python, Bash, or similar) Understanding of container security and DevSecOps principles Familiarity with threat frameworks and security best practices Experience with tools such as Microsoft Defender or similar security platforms Exposure to infrastructure-as-code ...

Technical Lead Edge Platform

Hiring Organisation
VoCoVo
Location
South Gloucestershire, United Kingdom
Employment Type
Full Time
Salary
80000 to 85000 GBP Annually
MicroK8s). Experience with image build tooling and immutable OS concepts, familiarity with tools such as Kairos, OSTree is highly desirable. Practical exposure to observability at scale, including metrics, logging, alerting (Prometheus, Grafana, Loki) and hands-on experience with OpenTelemetry. Experience operating or building infrastructure to manage, monitor and update … implement secure, reliable over-the-air (OTA) update mechanisms for OS and workload delivery at scale. Take ownership of the edge platform's observability, reliability and security, including driving adoption of OpenTelemetry across the edge estate. Contribute to the technical roadmap, researching new approaches and producing demonstrations and proofs ...

DevOps Release Manager

Hiring Organisation
Centrica - CHP
Location
Windsor, Berkshire, South East, United Kingdom
Employment Type
Permanent, Work From Home
Description Join us, be part of more. We're so much more than an energy company. We're a family of brands revolutionising how we power the planet. We're energisers. One team of 21 ...

Forward Deployed Engineer

Hiring Organisation
Novatus
Location
London Area, United Kingdom
Novatus Global is a Series B scale-up RegTech SaaS provider and boutique advisory firm, helping financial institutions manage their most complex regulatory requirements. We combine deep consulting expertise with cutting-edge SaaS solutions, enabling ...

Principal AI Architect

Hiring Organisation
Jobleads-UK
Location
Greater London, England, United Kingdom
experimenting with cutting‐edge technologies. Preferred Requirements Advanced Integration - Experience integrating Salesforce with external agents via APIs and open standards (MCP, A2A). Governance & Observability - Familiarity with prompt governance, observability, monitoring frameworks, responsible AI and compliance best practices Cross‐Platform Background - Background in cross‐platform integrations (e.g., Hyperscaler SDKs ...

Senior Software Engineer – AI / Agentic Systems

Hiring Organisation
MA (Montreal Associates)
Location
City of London, London, United Kingdom
grade AI platform. You’ll operate at the core of the product engineering function—designing systems that power autonomous agents, orchestrate workflows, and enable observability at scale. This is not just another backend role. You’ll influence architecture, mentor engineers, and help define the technical direction of a rapidly growing … Lead design and code reviews , ensuring high standards of quality and security Collaborate closely with AI research, product, and infrastructure teams Improve system reliability, observability, and scalability Mentor engineers and act as a technical multiplier across teams Champion best practices, tooling, and engineering excellence Proactively identify and resolve technical debt ...

Senior Software Engineer (Node.js / TypeScript / AWS)

Hiring Organisation
Adria Solutions
Location
Manchester, North West, United Kingdom
Employment Type
Permanent
Salary
£80,000
build scalable backend services and cloud infrastructure Architect event-driven and distributed systems on AWS Develop APIs, microservices and internal tooling Improve reliability, observability and developer workflows Conduct load testing and performance optimisation Contribute to frontend applications where required About You You are a senior engineer with deep backend … driven architectures and high-concurrency systems Infrastructure as Code experience (Pulumi, Terraform or similar) Strong understanding of databases, caching and performance optimisation Experience with observability, monitoring and alerting Comfortable working across the stack when required Strong Linux, Docker and Git knowledge Not the Right Fit If Your experience is primarily ...

IT Service Performance & Reliability Manager

Hiring Organisation
Spectrum It Recruitment Limited
Location
New Milton, Hampshire, South East, United Kingdom
Employment Type
Permanent
Salary
£60,000
across critical IT services. This role focuses on keeping customer-facing services fast, reliable, and fully observable, while driving continuous improvement. You will lead observability across services, ensuring effective monitoring and actionable insights. You'll manage capacity and performance through forecasting and trend analysis, identifying risks early and driving improvements. … performance in IT environments Hands-on experience with AWS and Azure Strong knowledge of ITIL v3/v4 (certification required) Experience with monitoring/observability tools (e.g. Zabbix, Grafana, Kibana, OpenSearch) Knowledge of Windows and Linux server environments Scripting skills (e.g. Python, PowerShell, Node.js) Experience integrating data via APIs, webhooks ...

Director - Principal Engineer (Java/Angular/AI)

Hiring Organisation
Robert Walters
Location
London, South East, England, United Kingdom
Employment Type
Full-Time
Salary
£140,000 - £170,000 per annum
volumes of financial and transactional data Contribute directly to architecture, system design, and hands-on software development Drive engineering best practices across automation, testing, observability, and performance Build resilient, production-grade systems with a strong focus on reliability and scalability Work across the full software development lifecycle from design through … scalability, and high-availability systems Experience building automated, production-grade platforms with minimal manual intervention Familiarity with cloud-native technologies, CI/CD, and observability tooling Strong engineering mindset with a hands-on approach to development Interest in modern engineering tooling, including AI-assisted development workflows Robert Walters Operations Limited ...

Platform Engineer: £120k + Bonus/benefits (AI Trading)

Hiring Organisation
Hunter Bond
Location
London Area, United Kingdom
global trading platform. The successful candidate will be involved in every layer of the technology stack—from hardware and operating systems to automation and observability—while gaining exposure to how a world-class investment firm manages its technology infrastructure. Key Responsibilities Manage a distributed compute environment and several petabyte-scale … agile methodologies) Familiarity with infrastructure automation and configuration management tools (Chef, Puppet, or Ansible) Exposure to distributed storage systems and related protocols Experience with observability and monitoring tools (Elasticsearch, Logstash, Kibana, Datadog, Prometheus, Grafana) Strong written and verbal communication skills Demonstrated ability to learn quickly and adapt to evolving technologies ...

Lead Software Engineer

Hiring Organisation
5V Video
Location
City of London, London, United Kingdom
+ AWS (Lambda, API Gateway, S3, DynamoDB) Handling event-driven architectures (Kafka, SNS/SQS, etc.) Driving system design decisions across distributed systems Improving observability, reliability, and performance in production Debugging complex issues and leading resolution across teams Staying hands-on while setting technical direction and standards Tech Stack Python … Lambda, API Gateway, S3, DynamoDB, IAM) Event-driven systems (Kafka, SNS/SQS) CI/CD (Concourse, Git workflows) Databases (Postgres, DynamoDB, Couchbase) Observability (Prometheus, Grafana, CloudWatch) What You’ll Bring Strong backend engineering experience (Python preferred) Proven experience building distributed systems at scale Deep understanding of microservices + event ...

Head of Infrastructure

Hiring Organisation
Jobleads-UK
Location
Greater London, England, United Kingdom
cloud architecture, operational resilience, developer experience and infrastructure team leadership. You will be responsible for shaping the long term infrastructure roadmap, improving reliability and observability, strengthening incident response and ensuring the platform can support a growing customer base and increasingly critical product suite. This is a role for someone … platform strategy Design and evolve the AWS cloud architecture to support scale, resilience and performance Set standards across infrastructure, CI/CD, environments and observability Lead production reliability, uptime, incident response and post incident reviews Improve monitoring, alerting and on call practices to ensure they are effective and sustainable Partner ...

Data Platform Engineer

Hiring Organisation
Noir
Location
Milton Keynes, Buckinghamshire, England, United Kingdom
Employment Type
Full-Time
Salary
£55,000 - £70,000 per annum
office) (Tech stack: Data Platform Engineer, Cloud (Azure/AWS/GCP), Microsoft Fabric, SQL Server, Platform as Code, Terraform, GitHub, Data Platform, Monitoring, Observability) Our client is a leading UK enterprise investing heavily in its technology landscape as part of a large-scale transformation programme. They are seeking … also apply Platform as Code principles with Terraform to improve automation and consistency. The Data Platform Engineer will contribute to capacity planning, monitoring and observability, ensuring the Data Platform performs effectively. You will work with Microsoft Fabric to enhance platform capabilities, while using Terraform to manage scalable infrastructure. Operating within ...

Platform Storage Engineer

Hiring Organisation
Ncounter
Location
East London, London, England, United Kingdom
Employment Type
Full-Time
Salary
£160,000 - £190,000 per annum
vendor storage tooling into a unified platform • Improve storage throughput, data locality and platform efficiency for research workloads • Collaborate closely with compute, networking and observability teams across the wider platform estate • Support troubleshooting, tuning and reliability engineering for production storage systems What we’re looking for: • Strong backend or systems … Rust, C++ or Java • Experience building or supporting distributed systems at scale • Strong Linux knowledge and an interest in infrastructure engineering • Exposure to observability tooling such as Prometheus, Grafana, Datadog or ELK • Understanding of cloud and infrastructure automation, ideally AWS, GCP or Terraform • Any experience with Ceph, MinIO, JuiceFS, FUSE ...

Cloud Operations Engineer

Hiring Organisation
Anson Mccade
Location
Cheltenham, Gloucestershire, South West, United Kingdom
Employment Type
Permanent
strong hands-on experience required) Kubernetes (deployment, troubleshooting, and platform support) Infrastructure as Code (Terraform or similar tools) Cloud-native networking and system troubleshooting Observability and monitoring tools APIs and integration services Secure, restricted, air-gapped cloud environments Required Experience Strong experience working with Linux-based systems in production environments … operate within highly secure cloud architectures Desirable Experience Kubernetes administration or advanced troubleshooting experience Infrastructure as Code experience (Terraform or similar) Exposure to observability and monitoring platforms Experience working in 24/7 operational environments Prior experience coordinating shifts or leading small technical teams deep expertise in secure cloud operations ...

Software Developer

Hiring Organisation
Transunion
Location
Alderley Edge, Cheshire, United Kingdom
Employment Type
Permanent
build reliable backend systems and infrastructure tooling Use TDD to write high-quality, maintainable code and build out automated test suites Own reliability, observability, and performance of key services Collaborate with clients to understand requirements, debug issues, and propose solutions Drive improvements to system architecture, automation, and deployment processes Mentor … Desirable Skills & Experience: Experience owning backend systems in production environments Experience with Cloud Platforms AWS or GCP Infrastructure-as-code, CI/CD, and observability tooling Experience scaling systems under sustained load Contributions to internal tooling or open source Experience with large datasets and machine learning models Impact ...

Cloud Security and Platform Engineer

Hiring Organisation
RealityMine
Location
Trafford Park, Greater Manchester, UK
mainly focused on AWS, with growing involvement in other cloud and SaaS platforms. You’ll improve existing environments—managing identity and access, governance, security, observability, and lifecycle—by reducing risks, eliminating unsafe configurations, validating ownership, and ensuring the cloud estate is clearly governed and auditable. You will take an active … role in improving RealityMine’s security posture by improving and operating security scanning, improving monitoring and observability, and ensuring risks, vulnerabilities, and end of life components are identified and addressed in a timely and pragmatic way. You will also develop automation used to support security and operational hygiene, reducing manual ...

Cloud Security and Platform Engineer

Hiring Organisation
RealityMine
Location
Trafford Park, England, United Kingdom
mainly focused on AWS, with growing involvement in other cloud and SaaS platforms. You’ll improve existing environments—managing identity and access, governance, security, observability, and lifecycle—by reducing risks, eliminating unsafe configurations, validating ownership, and ensuring the cloud estate is clearly governed and auditable. You will take an active … role in improving RealityMine’s security posture by improving and operating security scanning, improving monitoring and observability, and ensuring risks, vulnerabilities, and end of life components are identified and addressed in a timely and pragmatic way. You will also develop automation used to support security and operational hygiene, reducing manual ...

Principal Engineer - Platform Enablement Squad

Hiring Organisation
Centrica - CHP
Location
Windsor, Berkshire, South East, United Kingdom
Employment Type
Permanent
enhance safety, compliance, customer experience, and productivity Establish engineering excellence across teams: Champion high engineering standards including clean architecture, CI/CD automation, observability, testing strategies, release processes, telemetry, performance tuning, and secure-by-design principles Lead platform performance, reliability & offline capability: Ensure the environment performs reliably in challenging field … Quality and Platform-wide capabilities: Shape quality, resilience, and security strategies across teams-ensuring teams adopt shift-left testing, strong security hygiene, consistent observability, and reliable operational processes Improve how work is done: Continuously identify opportunities to automate, simplify, reduce cycle time, improve developer experience, adopt new tools ...

Remote Network Monitoring Specialist - Streaming Telemetry

Hiring Organisation
Akkodis
Location
Manchester, United Kingdom
Employment Type
Permanent
Salary
£70000 - £75000/annum
ensure the environment is fully visible, measurable and supportable from day one. The role would suit someone with strong experience across network observability, alerting, telemetry, dashboards, service health, performance baselining and operational handover. The client is open to different monitoring backgrounds, particularly where candidates have worked with tools such … solutions across newly delivered network infrastructure. Build monitoring capability that provides clear visibility of network health, performance and service availability. Work with monitoring and observability platforms such as VictoriaMetrics, Prometheus, Grafana, Nagios, Zabbix, InfluxDB, SolarWinds, PRTG, Datadog, Elastic or similar. Support metrics ingestion, retention, alerting, dashboarding and performance visibility. Build ...

Remote Network Monitoring Specialist - Streaming Telemetry

Hiring Organisation
Akkodis
Location
London, United Kingdom
Employment Type
Permanent
Salary
£70000 - £75000/annum
ensure the environment is fully visible, measurable and supportable from day one. The role would suit someone with strong experience across network observability, alerting, telemetry, dashboards, service health, performance baselining and operational handover. The client is open to different monitoring backgrounds, particularly where candidates have worked with tools such … solutions across newly delivered network infrastructure. Build monitoring capability that provides clear visibility of network health, performance and service availability. Work with monitoring and observability platforms such as VictoriaMetrics, Prometheus, Grafana, Nagios, Zabbix, InfluxDB, SolarWinds, PRTG, Datadog, Elastic or similar. Support metrics ingestion, retention, alerting, dashboarding and performance visibility. Build ...