401 to 425 of 468 Observability Jobs in England

Senior Software Engineer – AI / Agentic Systems

Hiring Organisation
MA (Montreal Associates)
Location
City of London, London, United Kingdom
grade AI platform. You’ll operate at the core of the product engineering function—designing systems that power autonomous agents, orchestrate workflows, and enable observability at scale. This is not just another backend role. You’ll influence architecture, mentor engineers, and help define the technical direction of a rapidly growing … Lead design and code reviews , ensuring high standards of quality and security Collaborate closely with AI research, product, and infrastructure teams Improve system reliability, observability, and scalability Mentor engineers and act as a technical multiplier across teams Champion best practices, tooling, and engineering excellence Proactively identify and resolve technical debt ...

Senior Software Engineer (Node.js / TypeScript / AWS)

Hiring Organisation
Adria Solutions
Location
Manchester, North West, United Kingdom
Employment Type
Permanent
Salary
£80,000
build scalable backend services and cloud infrastructure Architect event-driven and distributed systems on AWS Develop APIs, microservices and internal tooling Improve reliability, observability and developer workflows Conduct load testing and performance optimisation Contribute to frontend applications where required About You You are a senior engineer with deep backend … driven architectures and high-concurrency systems Infrastructure as Code experience (Pulumi, Terraform or similar) Strong understanding of databases, caching and performance optimisation Experience with observability, monitoring and alerting Comfortable working across the stack when required Strong Linux, Docker and Git knowledge Not the Right Fit If Your experience is primarily ...

IT Service Performance & Reliability Manager

Hiring Organisation
Spectrum It Recruitment Limited
Location
New Milton, Hampshire, South East, United Kingdom
Employment Type
Permanent
Salary
£60,000
across critical IT services. This role focuses on keeping customer-facing services fast, reliable, and fully observable, while driving continuous improvement. You will lead observability across services, ensuring effective monitoring and actionable insights. You'll manage capacity and performance through forecasting and trend analysis, identifying risks early and driving improvements. … performance in IT environments Hands-on experience with AWS and Azure Strong knowledge of ITIL v3/v4 (certification required) Experience with monitoring/observability tools (e.g. Zabbix, Grafana, Kibana, OpenSearch) Knowledge of Windows and Linux server environments Scripting skills (e.g. Python, PowerShell, Node.js) Experience integrating data via APIs, webhooks ...

Director - Principal Engineer (Java/Angular/AI)

Hiring Organisation
Robert Walters
Location
London, South East, England, United Kingdom
Employment Type
Full-Time
Salary
£140,000 - £170,000 per annum
volumes of financial and transactional data Contribute directly to architecture, system design, and hands-on software development Drive engineering best practices across automation, testing, observability, and performance Build resilient, production-grade systems with a strong focus on reliability and scalability Work across the full software development lifecycle from design through … scalability, and high-availability systems Experience building automated, production-grade platforms with minimal manual intervention Familiarity with cloud-native technologies, CI/CD, and observability tooling Strong engineering mindset with a hands-on approach to development Interest in modern engineering tooling, including AI-assisted development workflows Robert Walters Operations Limited ...

Platform Engineer: £120k + Bonus/benefits (AI Trading)

Hiring Organisation
Hunter Bond
Location
London Area, United Kingdom
global trading platform. The successful candidate will be involved in every layer of the technology stack—from hardware and operating systems to automation and observability—while gaining exposure to how a world-class investment firm manages its technology infrastructure. Key Responsibilities Manage a distributed compute environment and several petabyte-scale … agile methodologies) Familiarity with infrastructure automation and configuration management tools (Chef, Puppet, or Ansible) Exposure to distributed storage systems and related protocols Experience with observability and monitoring tools (Elasticsearch, Logstash, Kibana, Datadog, Prometheus, Grafana) Strong written and verbal communication skills Demonstrated ability to learn quickly and adapt to evolving technologies ...

Lead Software Engineer

Hiring Organisation
5V Video
Location
City of London, London, United Kingdom
+ AWS (Lambda, API Gateway, S3, DynamoDB) Handling event-driven architectures (Kafka, SNS/SQS, etc.) Driving system design decisions across distributed systems Improving observability, reliability, and performance in production Debugging complex issues and leading resolution across teams Staying hands-on while setting technical direction and standards Tech Stack Python … Lambda, API Gateway, S3, DynamoDB, IAM) Event-driven systems (Kafka, SNS/SQS) CI/CD (Concourse, Git workflows) Databases (Postgres, DynamoDB, Couchbase) Observability (Prometheus, Grafana, CloudWatch) What You’ll Bring Strong backend engineering experience (Python preferred) Proven experience building distributed systems at scale Deep understanding of microservices + event ...

Head of Infrastructure

Hiring Organisation
Jobleads-UK
Location
Greater London, England, United Kingdom
cloud architecture, operational resilience, developer experience and infrastructure team leadership. You will be responsible for shaping the long term infrastructure roadmap, improving reliability and observability, strengthening incident response and ensuring the platform can support a growing customer base and increasingly critical product suite. This is a role for someone … platform strategy Design and evolve the AWS cloud architecture to support scale, resilience and performance Set standards across infrastructure, CI/CD, environments and observability Lead production reliability, uptime, incident response and post incident reviews Improve monitoring, alerting and on call practices to ensure they are effective and sustainable Partner ...

Platform Storage Engineer

Hiring Organisation
Ncounter
Location
East London, London, England, United Kingdom
Employment Type
Full-Time
Salary
£160,000 - £190,000 per annum
vendor storage tooling into a unified platform • Improve storage throughput, data locality and platform efficiency for research workloads • Collaborate closely with compute, networking and observability teams across the wider platform estate • Support troubleshooting, tuning and reliability engineering for production storage systems What we’re looking for: • Strong backend or systems … Rust, C++ or Java • Experience building or supporting distributed systems at scale • Strong Linux knowledge and an interest in infrastructure engineering • Exposure to observability tooling such as Prometheus, Grafana, Datadog or ELK • Understanding of cloud and infrastructure automation, ideally AWS, GCP or Terraform • Any experience with Ceph, MinIO, JuiceFS, FUSE ...

Data Platform Engineer

Hiring Organisation
Noir
Location
Milton Keynes, Buckinghamshire, UK
Employment Type
Full-time
office) (Tech stack: Data Platform Engineer, Cloud (Azure/AWS/GCP), Microsoft Fabric, SQL Server, Platform as Code, Terraform, GitHub, Data Platform, Monitoring, Observability) Our client is a leading UK enterprise investing heavily in its technology landscape as part of a large-scale transformation programme. They are seeking … also apply Platform as Code principles with Terraform to improve automation and consistency. The Data Platform Engineer will contribute to capacity planning, monitoring and observability, ensuring the Data Platform performs effectively. You will work with Microsoft Fabric to enhance platform capabilities, while using Terraform to manage scalable infrastructure. Operating within ...

Cloud Operations Engineer

Hiring Organisation
Anson Mccade
Location
Cheltenham, Gloucestershire, South West, United Kingdom
Employment Type
Permanent
strong hands-on experience required) Kubernetes (deployment, troubleshooting, and platform support) Infrastructure as Code (Terraform or similar tools) Cloud-native networking and system troubleshooting Observability and monitoring tools APIs and integration services Secure, restricted, air-gapped cloud environments Required Experience Strong experience working with Linux-based systems in production environments … operate within highly secure cloud architectures Desirable Experience Kubernetes administration or advanced troubleshooting experience Infrastructure as Code experience (Terraform or similar) Exposure to observability and monitoring platforms Experience working in 24/7 operational environments Prior experience coordinating shifts or leading small technical teams deep expertise in secure cloud operations ...

Software Developer

Hiring Organisation
Transunion
Location
Alderley Edge, Cheshire, United Kingdom
Employment Type
Permanent
build reliable backend systems and infrastructure tooling Use TDD to write high-quality, maintainable code and build out automated test suites Own reliability, observability, and performance of key services Collaborate with clients to understand requirements, debug issues, and propose solutions Drive improvements to system architecture, automation, and deployment processes Mentor … Desirable Skills & Experience: Experience owning backend systems in production environments Experience with Cloud Platforms AWS or GCP Infrastructure-as-code, CI/CD, and observability tooling Experience scaling systems under sustained load Contributions to internal tooling or open source Experience with large datasets and machine learning models Impact ...

Cloud Security and Platform Engineer

Hiring Organisation
RealityMine
Location
Trafford Park, Greater Manchester, UK
mainly focused on AWS, with growing involvement in other cloud and SaaS platforms. You’ll improve existing environments—managing identity and access, governance, security, observability, and lifecycle—by reducing risks, eliminating unsafe configurations, validating ownership, and ensuring the cloud estate is clearly governed and auditable. You will take an active … role in improving RealityMine’s security posture by improving and operating security scanning, improving monitoring and observability, and ensuring risks, vulnerabilities, and end of life components are identified and addressed in a timely and pragmatic way. You will also develop automation used to support security and operational hygiene, reducing manual ...

Cloud Security and Platform Engineer

Hiring Organisation
RealityMine
Location
Trafford Park, England, United Kingdom
mainly focused on AWS, with growing involvement in other cloud and SaaS platforms. You’ll improve existing environments—managing identity and access, governance, security, observability, and lifecycle—by reducing risks, eliminating unsafe configurations, validating ownership, and ensuring the cloud estate is clearly governed and auditable. You will take an active … role in improving RealityMine’s security posture by improving and operating security scanning, improving monitoring and observability, and ensuring risks, vulnerabilities, and end of life components are identified and addressed in a timely and pragmatic way. You will also develop automation used to support security and operational hygiene, reducing manual ...

Principal Engineer - Platform Enablement Squad

Hiring Organisation
Centrica - CHP
Location
Windsor, Berkshire, South East, United Kingdom
Employment Type
Permanent
enhance safety, compliance, customer experience, and productivity Establish engineering excellence across teams: Champion high engineering standards including clean architecture, CI/CD automation, observability, testing strategies, release processes, telemetry, performance tuning, and secure-by-design principles Lead platform performance, reliability & offline capability: Ensure the environment performs reliably in challenging field … Quality and Platform-wide capabilities: Shape quality, resilience, and security strategies across teams-ensuring teams adopt shift-left testing, strong security hygiene, consistent observability, and reliable operational processes Improve how work is done: Continuously identify opportunities to automate, simplify, reduce cycle time, improve developer experience, adopt new tools ...

Remote Network Monitoring Specialist - Streaming Telemetry

Hiring Organisation
Akkodis
Location
Manchester, United Kingdom
Employment Type
Permanent
Salary
£70000 - £75000/annum
ensure the environment is fully visible, measurable and supportable from day one. The role would suit someone with strong experience across network observability, alerting, telemetry, dashboards, service health, performance baselining and operational handover. The client is open to different monitoring backgrounds, particularly where candidates have worked with tools such … solutions across newly delivered network infrastructure. Build monitoring capability that provides clear visibility of network health, performance and service availability. Work with monitoring and observability platforms such as VictoriaMetrics, Prometheus, Grafana, Nagios, Zabbix, InfluxDB, SolarWinds, PRTG, Datadog, Elastic or similar. Support metrics ingestion, retention, alerting, dashboarding and performance visibility. Build ...

Remote Network Monitoring Specialist - Streaming Telemetry

Hiring Organisation
Akkodis
Location
London, United Kingdom
Employment Type
Permanent
Salary
£70000 - £75000/annum
ensure the environment is fully visible, measurable and supportable from day one. The role would suit someone with strong experience across network observability, alerting, telemetry, dashboards, service health, performance baselining and operational handover. The client is open to different monitoring backgrounds, particularly where candidates have worked with tools such … solutions across newly delivered network infrastructure. Build monitoring capability that provides clear visibility of network health, performance and service availability. Work with monitoring and observability platforms such as VictoriaMetrics, Prometheus, Grafana, Nagios, Zabbix, InfluxDB, SolarWinds, PRTG, Datadog, Elastic or similar. Support metrics ingestion, retention, alerting, dashboarding and performance visibility. Build ...

Platform Engineer

Hiring Organisation
Gravitas Recruitment Group (Global) Ltd
Location
City of London, London, United Kingdom
responsibilities include: Scaling serverless cloud infrastructure for growth and multi-region reliability Building and improving CI/CD pipelines and deployment systems Enhancing observability, monitoring, and incident response Developing internal tooling to improve engineering productivity Contributing to production code (TypeScript) across infrastructure and product Tech Environment AWS (serverless-first architecture … Pulumi (or similar infrastructure-as-code tools) GitHub Actions for CI/CD Datadog for observability TypeScript across the stack What They’re Looking For Strong platform engineering experience in cloud-native SaaS environments Hands-on experience with AWS serverless architecture (e.g. Lambda, DynamoDB, event-driven systems) Experience building ...

Remote Network Monitoring Engineer - VictoriaMetrics

Hiring Organisation
Akkodis
Location
London, United Kingdom
Employment Type
Permanent
Salary
£70000 - £75000/annum
VictoriaMetrics in a production environment, including configuration, optimisation, ingestion, retention and performance tuning. You will also work across streaming telemetry, Nagios, Grafana and wider observability tooling. This would suit someone with strong network monitoring experience who is comfortable taking ownership of a critical technical workstream in a project-led environment. … Looking for: Strong hands-on experience with VictoriaMetrics in a production environment. Previous experience in a senior network monitoring, network engineering or observability-focused role. Experience working in a telecoms, ISP, managed network or large-scale infrastructure environment. Good understanding of time-series monitoring, metrics ingestion, retention and performance tuning. ...

Remote Network Monitoring Engineer - VictoriaMetrics

Hiring Organisation
Akkodis
Location
Manchester, Lancashire, England, United Kingdom
Employment Type
Full-Time
Salary
£70,000 - £75,000 per annum
VictoriaMetrics in a production environment, including configuration, optimisation, ingestion, retention and performance tuning. You will also work across streaming telemetry, Nagios, Grafana and wider observability tooling. This would suit someone with strong network monitoring experience who is comfortable taking ownership of a critical technical workstream in a project-led environment. … Looking for: Strong hands-on experience with VictoriaMetrics in a production environment. Previous experience in a senior network monitoring, network engineering or observability-focused role. Experience working in a telecoms, ISP, managed network or large-scale infrastructure environment. Good understanding of time-series monitoring, metrics ingestion, retention and performance tuning. ...

Remote Network Monitoring Engineer - VictoriaMetrics

Hiring Organisation
Akkodis
Location
Birmingham, West Midlands, England, United Kingdom
Employment Type
Full-Time
Salary
£70,000 - £75,000 per annum
VictoriaMetrics in a production environment, including configuration, optimisation, ingestion, retention and performance tuning. You will also work across streaming telemetry, Nagios, Grafana and wider observability tooling. This would suit someone with strong network monitoring experience who is comfortable taking ownership of a critical technical workstream in a project-led environment. … Looking for: Strong hands-on experience with VictoriaMetrics in a production environment. Previous experience in a senior network monitoring, network engineering or observability-focused role. Experience working in a telecoms, ISP, managed network or large-scale infrastructure environment. Good understanding of time-series monitoring, metrics ingestion, retention and performance tuning. ...

Contract: Google DCX Engineering Lead - Hybrid (various locations) - 12 months

Hiring Organisation
Hamilton Barnes
Location
Bristol, United Kingdom
Employment Type
Contract
Contract Rate
GBP 500 - 530 Annual
Code (Terraform) and CI/CD automation pipelines. Support secure-by-design cloud infrastructure including networking, monitoring, logging, and key management. Drive automation, observability, reliability, and platform optimisation initiatives. Collaborate with architecture, platform, and feature teams to evolve cloud capabilities and reusable patterns. Support operational governance including SLO/… GitHub Enterprise, and CI/CD tooling. Experience supporting RHEL and Windows virtual machine environments in cloud infrastructure. Strong understanding of cloud security, automation, observability, and operational reliability. Scripting capability using Python, Bash, or PowerShell. Experience operating at a senior engineering or technical leadership level, balancing strategy with hands ...

SRE Consultant

Hiring Organisation
Akkodis
Location
City of London, London, United Kingdom
Employment Type
Permanent
Salary
£90000 - £100000/annum
include: Define and embed SRE engagement models aligned to modern engineering and traditional ITSM/ITIL practices Establish SLIs, SLOs, and Error Budgets Shape observability strategies using metrics, logs, and traces Design incident response models and post-incident learning loops Reduce toil through automation and engineering excellence Deliver SRE capability … Looking For Extensive experience in SRE, cloud operations, or DevOps Proven consulting or advisory background Experience with AWS, Azure, or GCP Strong observability and incident management expertise Ability to obtain UK SC clearance Modis International Ltd acts as an employment agency for permanent recruitment and an employment business ...

SRE Manager /Ops Manager

Hiring Organisation
Infoplus Technologies UK Ltd
Location
Wokingham, Berkshire, South East, United Kingdom
Employment Type
Contract, Work From Home
Contract Rate
From £400 to £450 per day
Incidents, service risks, and operational failures. Service Reliability & Operations (SRE Focus) Define, own, and govern SLO, service health metrics. Ensure proactive monitoring, alerting, and observability across the estate. Lead blameless post-incident reviews, root cause analysis, and preventative actions. BAU Team Leadership Lead and manage multiple BAU teams, potentially covering … Service Management teams in complex environments. Proven accountability for 24x7 BAU services at scale. Deep understanding of: Incident & problem management Monitoring & observability Change & release control Experience working across cloud, applications, data, and integrations. Strong stakeholder and escalation management skills. Desirable Background in Site Reliability Engineering or DevOps-led operations. Knowledge ...

Salesforce Engineer - Field Mobile Platform

Hiring Organisation
Focus on SAP
Location
Reading, England, United Kingdom
across teams. Lead platform simplification, reducing duplication and technical debt. Engineering Excellence Champion best practices in: Clean architecture CI/CD automation Testing strategies Observability and telemetry Performance optimisation Secure-by-design principles Ensure high reliability, resilience, and offline capability for field environments. Cross-Functional Collaboration Partner with Product, Architecture … Proven ability to define and influence engineering standards across multiple teams. Strong experience with: CI/CD pipelines and DevOps practices Automated testing and observability Performance tuning and platform optimisation Experience designing resilient, offline-capable mobile solutions Ability to solve complex, cross-domain technical challenges. Strong stakeholder engagement and communication ...

AI Engineer

Hiring Organisation
MarkIT Placements
Location
West London, London, United Kingdom
Employment Type
Contract, Work From Home
execution Deploy AI systems into cloud, on-premises, and air-gapped environments Build production-ready pipelines from data ingestion through to inference Experience with observability for AI systems, including agent behaviour, model performance, and failure modes Collaborate with engineers, product leads, and customers to translate requirements into working systems Contribute … with edge or offline AI deployments Familiarity with Kubernetes (EKS/OpenShift) for monitoring and managing deployed applications MLOps experience - model evaluation, monitoring, reproducibility Observability tooling for agentic systems (model drift, agent behaviour, performance monitoring) Experience with agent orchestration patterns and inter-agent communication protocols (e.g. A2A) Familiarity with MCPs ...