376 to 400 of 470 Observability Jobs in England

Client Service Delivery

Hiring Organisation
Accenture
Location
Birmingham, England, United Kingdom
Service Delivery Management Own full lifecycle service delivery across infrastructure and cloud environments, ensuring alignment to SLAs, KPIs, scope, and cost. Leverage AIOps and observability tools (e.g. Dynatrace, Datadog, New Relic, Elastic) to proactively monitor service health and performance. Utilise predictive alerting and anomaly detection to prevent incidents and optimise … infrastructure and cloud environments Strong understanding of IT Managed Services frameworks Hands-on experience with AIOps tools such as Dynatrace and ServiceNow Familiarity with observability tools (e.g. Datadog, New Relic, Elastic) Knowledge of event analytics tools such as Splunk IT Service Intelligence and Moogsoft Experience in stakeholder and client management ...

Head of Infrastructure

Hiring Organisation
Jobleads-UK
Location
Greater London, England, United Kingdom
platform and infrastructure strategy Design and evolve cloud architecture to support scale, resilience, and performance Set standards for infrastructure, CI/CD, environments, and observability Make architectural decisions and trade‐offs Developer Experience (DevEx) Provide infrastructure for the development team to code, test and deploy efficiently Advise during design sessions … growing company Ability to operate production systems under pressure Deep hands‐on experience with the AWS cloud platform Strong background in reliability, observability, and incident management Experience leading or mentoring engineers What we offer in return 💰 Competitive salary depending on experience 🏝️ 27 days of annual leave (including 3 days Christmas ...

Software Engineering Manager - Knowledge/AI and Platform Enablement Squads

Hiring Organisation
Centrica - CHP
Location
Windsor, Berkshire, South East, United Kingdom
Employment Type
Permanent, Work From Home
best practice, reduce duplication, and promote maintainable, secure and performant systems. Enhance delivery capability through platform reliability and DevOps maturity - Continuously improve deployment pipelines, observability, alerting, incident handling, recovery procedures and operational readiness Manage stakeholders and ensure transparent communications - Build strong relationships across product, operations, delivery and business teams … management, data modelling and data quality controls. Ability to produce high level and detailed design specifications. Experience running DevOps practices including CI/CD, observability, monitoring and incident management. Multi-squad engineering leadership Proven experience leading software engineering delivery in a complex, multi team environment Experience providing technical leadership ...

Software Engineering Manager - In-Life Domain

Hiring Organisation
Centrica - CHP
Location
Windsor, Berkshire, South East, United Kingdom
Employment Type
Permanent, Work From Home
practice, reduce duplication, and promote maintainable, secure and performant systems. 4. Enhance delivery capability through platform reliability and DevOps maturity Continuously improve deployment pipelines, observability, alerting, incident handling, recovery procedures and operational readiness across Field Ops engineering teams. 5. Manage stakeholders and ensure transparent communications Build strong relationships across product … management, data modelling and data quality controls. Ability to produce high level and detailed design specifications. Experience running DevOps practices including CI/CD, observability, monitoring and incident management. Demonstrated capability in leading multi squad engineering execution in a product led organisation. Mindset & Ways of Working Comfortable working in iterative ...

Senior Site Reliability Engineer

Hiring Organisation
Realm
Location
City of London, London, United Kingdom
High-growth infrastructure company focused on delivering large-scale compute, data centre capacity, and power solutions for advanced machine learning workloads. Platforms support leading research and industry teams requiring high-performance computing at significant scale. ...

SRE Managing Consultant - Cloud Operating Model

Hiring Organisation
Jobleads-UK
Location
Manchester, England, United Kingdom
Budgets : Establish service measures and targets (SLIs/SLOs) and introduce Error Budgets to enable data‐driven trade‐offs between reliability and delivery velocity. Observability & Operational Insight: Shape observability approaches (metrics/logs/traces) and operational monitoring models that make reliability risks visible and actionable, improving operational decision‐making. … large‐scale delivery contexts; associate‐level certifications are desirable but not mandatory. Design, establish, and evolve SRE‐led centres of excellence (e.g. Reliability, Observability, or Operational Excellence), setting enterprise‐level standards for SLIs/SLOs, incident management, observability, and continuous improvement across cloud and hybrid platforms. Exposure to modern observability ...

DevOps Engineer

Hiring Organisation
Infinity Quest
Location
Halifax, England, United Kingdom
Actions, Harness, Jenkins). • Networking & Security: Experience with GCP Cloud Armor, GCP Networking, and embedding secure-by-design controls from design to runtime. • Automation & Observability: Implementing actionable observability, performance tuning, and automation to reduce toil. Defining and operating against SLOs/SLIs. • Scripting & Tooling: Scripting in Bash, PowerShell, or Python. … Performance & Reliability: Define, monitor, and operate against service level objectives (SLOs/SLIs), ensuring high availability, performance, and fault tolerance. • Continuous Improvement: Drive automation, observability, and performance tuning to reduce manual effort and improve platform reliability. • Collaboration: Work closely with architecture and feature teams to evolve the cloud roadmap ...

SRE Observability Engineer

Hiring Organisation
Access Computer Consulting
Location
City of London, London, United Kingdom
Employment Type
Contract
Contract Rate
£350 - £450/day
recruiting for an SRE Observability Engineer to work in London 2-3 days a week, remaining time remote. The role falls inside IR35 so you will be required to work through an umbrella company for the duration of the contract. This is a 6 month contract which will transfer … permanent role after the initial contract term. You will be responsible for collaborating across various organisations within the client to understand and develop observability solutions for enterprise-wide deployment at scale. You will also manage the legacy monitoring stack across the Production Management organisation within the client. You must have ...

Principal Engineer

Hiring Organisation
Centrica - CHP
Location
Windsor, Berkshire, South East, United Kingdom
Employment Type
Permanent, Work From Home
enhance safety, compliance, customer experience and productivity. 5. Establish engineering excellence across teams Champion high engineering standards: clean architecture, CI/CD automation, observability, testing strategies, release processes, telemetry, performance tuning and secure-by-design principles. 6. Lead platform performance, reliability & offline capability Ensure the mobile environment performs reliably … Quality and Platform wide capabilities Shape quality, resilience, and security strategies across teams-ensuring teams adopt shift left testing, strong security hygiene, consistent observability, and reliable operational processes. 8. Improve how work is done (template requirement) Continuously identify opportunities to automate, simplify, reduce cycle time, improve developer experience, adopt ...

DevOps Engineer ID46327

Hiring Organisation
Humand Talent
Location
Oxfordshire, England, United Kingdom
your typical DevOps role. You’ll be working across a mix of cloud-connected and fully isolated environments , tackling unique challenges around deployment, observability, and infrastructure at scale. You’ll play a key role in designing how complex systems are commissioned, deployed, and maintained in both standard and highly controlled … What you’ll be doing Building and automating infrastructure using modern IaC tools Developing and improving CI/CD pipelines (self-hosted environments) Designing observability across distributed systems Supporting deployments across both connected and air-gapped environments Contributing to the evolution of a hybrid cloud/on-prem platform What ...

Lead Integration Engineer & Developer

Hiring Organisation
Ashdown Group
Location
Liverpool, Merseyside, North West, United Kingdom
Employment Type
Permanent, Work From Home
Gateway, EventBridge, SQS, SNS) Node.js/Javascript/TypeScript and Python Data & Infrastructure DynamoDB, RDS Infrastructure as Code (Terraform, CDK, CloudFormation) CloudWatch and observability tooling Integrations HubSpot (CRM) Internal microservices and external APIs Required Experience 7+ years in backend or platform engineering Strong hands-on AWS experience (serverless preferred) Proven … APIs End-to-end ownership of systems (design build operate) Technical Expertise Event-driven architecture (EventBridge, SQS, SNS, Kafka) Reliability patterns (retries, idempotency, DLQs) Observability and debugging in distributed systems Data modelling and schema evolution Leadership & Collaboration Ability to lead technical design and influence architecture Experience mentoring engineers Strong communication ...

Senior Software Engineer - Up to £100k - Hybrid working

Hiring Organisation
Creo Recruitment
Location
City of London, London, United Kingdom
delivery of complex, scalable systems across multiple services Making pragmatic architectural decisions in a fast-moving environment Driving engineering excellence across testing, security, and observability Owning services in production – improving reliability, performance, and resilience Mentoring engineers and elevating team capability Collaborating cross-functionally with Product, Data, and Design What they … Strong experience designing scalable, maintainable systems with clear ownership boundaries Proven ability to lead delivery across ambiguous, complex problem spaces Deep understanding of reliability, observability, and production systems A security-first mindset with experience mitigating risks and vulnerabilities Excellent communication skills with the ability to influence and mentor Python Experience ...

Data Platform Engineer

Hiring Organisation
PRISM DIGITAL LIMITED
Location
Milton Keynes, Buckinghamshire, South East, United Kingdom
Employment Type
Permanent, Work From Home
Salary
£75,000
availability Own incident resolution, root cause analysis, and continuous improvement Collaborate with engineers and third-party providers to mature the platform Contribute to monitoring, observability, and cost optimisation strategies Support projects and business initiatives through robust platform delivery What Theyre Looking For: Microsoft Fabric experience Terraform experience Cloud platform engineering … delivery environments What Youll Work With: Microsoft Fabric Terraform (Infrastructure as Code) Azure cloud technologies SQL Server GitHub/CI/CD tooling Monitoring & observability tools Platform design patterns (scalability, resilience, cost control) Nice to Haves: GitHub Actions/CI/CD pipelines Zero Trust architecture Cloud cost monitoring & reporting ...

Data Platform Engineer

Hiring Organisation
PRISM DIGITAL LIMITED
Location
Northampton, Northamptonshire, UK
availability Own incident resolution, root cause analysis, and continuous improvement Collaborate with engineers and third-party providers to mature the platform Contribute to monitoring, observability, and cost optimisation strategies Support projects and business initiatives through robust platform delivery What Theyre Looking For: Microsoft Fabric experience Terraform experience Cloud platform engineering … delivery environments What Youll Work With: Microsoft Fabric Terraform (Infrastructure as Code) Azure cloud technologies SQL Server GitHub/CI/CD tooling Monitoring & observability tools Platform design patterns (scalability, resilience, cost control) Nice to Haves: GitHub Actions/CI/CD pipelines Zero Trust architecture Cloud cost monitoring & reporting ...

Infrastructure / DevOps Engineer

Hiring Organisation
rmg digital
Location
England, United Kingdom
Managing and optimising AWS services, including ECS, Lambda, VPC, and Aurora Postgres Building and maintaining CI/CD pipelines using GitHub Actions Implementing monitoring, observability, and alerting using Datadog Supporting development teams with deployment, automation, and operational best practices Improving infrastructure security, scalability, reliability, and cost-efficiency Monitoring system performance … Infrastructure as Code tools such as Terraform and/or CDK Understanding of CI/CD pipelines and GitHub Actions Familiarity with monitoring and observability tooling, such as Datadog Knowledge of containerisation concepts and infrastructure best practices Some experience with TypeScript or JavaScript for scripting and CDK purposes Strong troubleshooting ...

Senior Platform Engineer (Fully Remote) - GKE, GCP, Terraform

Hiring Organisation
Sanderson Recruitment
Location
London, United Kingdom
Employment Type
Permanent, Work From Home
manage workloads using Helm with strong isolation and configuration practices Own and improve CI/CD pipelines using Azure DevOps and GitOps Embed observability across the platform (monitoring, logging, alerting, tracing) Define and enforce platform standards, patterns and best practices Produce and maintain high-quality documentation, diagrams and runbooks Lead … expertise, particularly Azure DevOps Git-based workflows, GitOps and tools such as Argo CD Experience with service mesh technologies (e.g. Istio) Exposure to observability/APM tooling Confident technical leader with experience setting standards and mentoring others Comfortable working in shared platform environments Reasonable Adjustments: Respect and equality are core ...

Site Reliability Engineer

Hiring Organisation
EQUALS
Location
Greater London, England, United Kingdom
recommendation engine that matches people by musical taste. THE ROLE We're looking for a Site Reliability Engineer to own the infrastructure, observability, and operational health of the Equals platform. You'll be the person who monitors systems needs and health to provide a seamless user experience while providing traceability … 1B+ rows) - Manage Cloudflare (WAF, bot management, DNS, firewall rules) - Make cost-conscious infrastructure decisions - right-sizing instances, storage tiering, optimizing spend Monitoring & Observability - Own the Datadog APM setup: tracing, alerting, dashboards, log management - Maintain and tune alert channels integrated with Slack - Reduce alert fatigue by tuning thresholds, suppressing false ...

Principal Software Engineer

Hiring Organisation
BBC
Location
Greater London, United Kingdom
Employment Type
Full Time
Salary
65000 to 80000 GBP Annually
large-scale data ingestion platforms. Experience working across a broad range of technologies, platforms, and engineering domains within multi-team environments. Familiarity with observability, operational monitoring, CI/CD, and platform reliability practices. Experience with data technologies such as Airflow, Redshift, DynamoDB, MongoDB, or similar tooling. Interest in contributing … EventBridge SQL and NoSQL databases including Postgres, MongoDB, DynamoDB, and Timestream CI/CD and automation tooling including GitHub Actions, Jenkins, and CodePipeline Observability and visualisation tooling including Grafana and Tableau Our wider engineering ecosystem also includes web and mobile technologies, including TypeScript/JavaScript, Swift, and Kotlin, alongside ...

Platform engineer

Hiring Organisation
Beat My Salary
Location
Reading, Berkshire, United Kingdom
Employment Type
Permanent
Location : Reading NO Visa sponsorship Eligibility :ILR/Citizen/Dependent/Settled Domain : Telecom Job summary : Worked for large-scale, mission critical environments in Telecom domain. Implement service mesh architectures using Istio for traffic ...

Senior SRE Lead

Hiring Organisation
Albany Beck
Location
London Area, United Kingdom
about capability build, technical excellence, and delivering meaningful change within complex enterprise environments. Role Overview Albany Beck is seeking a Senior SRE Lead/Observability SME to lead the establishment of a new enterprise Site Reliability Engineering (SRE) capability, with a primary focus on designing and implementing a modern observability … suite and operational resilience framework. This is a foundational build role, responsible for defining how reliability engineering and observability are structured, measured, and embedded across a complex global technology estate. The successful candidate will play a key role in shifting the organisation from reactive operational support to a metrics-driven ...

Data Reliability Engineer

Hiring Organisation
Ashdown Group
Location
City of London, London, United Kingdom
Employment Type
Permanent, Work From Home
work from home 2 days per week. This is a high-impact role focused on improving data quality, reducing incidents, and building scalable observability across a modern enterprise data platform. Youll help ensure data across the organisation is accurate, reliable, and trusted for critical business decision-making. Youll take ownership … style roles, with strong SQL and Python skills and experience working in modern cloud-based data environments. Hands-on experience with data observability tools such as Grafana, Monte Carlo, or Acceldata, and data governance/quality platforms like Informatica, Collibra or Microsoft Purview is highly desirable. Experience within the Azure ...

Security Platform Engineer

Hiring Organisation
Addition
Location
Hampshire, England, United Kingdom
engineering teams to promote secure-by-design practices Maintaining clear documentation across systems, configurations, and processes Supporting the continuous improvement of platform security and observability Main Skills Needed: Background in Security Engineering, Platform Engineering, or similar Strong hands-on experience with Kubernetes and container environments Proven experience with tools such … Splunk and Nessus Knowledge of SIEM, observability, and vulnerability management practices Scripting or automation capability (Python, Bash, or similar) Understanding of container security and DevSecOps principles Familiarity with threat frameworks and security best practices Experience with tools such as Microsoft Defender or similar security platforms Exposure to infrastructure-as-code ...

Cloud Architect

Hiring Organisation
Tata Consultancy Services
Location
Luton, England, United Kingdom
least privilege, KMS encryption, secrets management, data classification, PII redaction, prompt/response filtering, and model governance. Drive non-functional requirements: reliability, scalability, latency, observability, DR, and cost controls (FinOps) for GenAI workloads. Guide build teams through solution design, reviews, and implementation; produce architecture artefacts (HLD/LLD), patterns … more languages (Python/Node.js preferred) and infrastructure-as-code (CDK/CloudFormation/Terraform) for repeatable deployments. Experience setting up observability for GenAI: tracing, logging, metrics, and model/application performance dashboards. Excellent communication skills for architecture storytelling, stakeholder management, and client-facing workshops. Rewards & Benefits TCS is consistently ...

Technical Lead Edge Platform

Hiring Organisation
VoCoVo
Location
South Gloucestershire, United Kingdom
Employment Type
Full Time
Salary
80000 to 85000 GBP Annually
MicroK8s). Experience with image build tooling and immutable OS concepts, familiarity with tools such as Kairos, OSTree is highly desirable. Practical exposure to observability at scale, including metrics, logging, alerting (Prometheus, Grafana, Loki) and hands-on experience with OpenTelemetry. Experience operating or building infrastructure to manage, monitor and update … implement secure, reliable over-the-air (OTA) update mechanisms for OS and workload delivery at scale. Take ownership of the edge platform's observability, reliability and security, including driving adoption of OpenTelemetry across the edge estate. Contribute to the technical roadmap, researching new approaches and producing demonstrations and proofs ...

DevOps Release Manager

Hiring Organisation
Centrica - CHP
Location
Windsor, Berkshire, South East, United Kingdom
Employment Type
Permanent, Work From Home
Description Join us, be part of more. We're so much more than an energy company. We're a family of brands revolutionising how we power the planet. We're energisers. One team of 21 ...