551 to 575 of 708 Observability Jobs in England

Senior DevOps Engineer

Hiring Organisation: Morgan McKinley
Location: Oxford, Oxfordshire, England, United Kingdom
Employment Type: Full-Time
Salary: Salary negotiable

Code (IaC): Work closely with the team to design, implement, and maintain scalable cloud architecture using modern IaC frameworks and centralized Git repositories. Observability & SRE Practices: Perform root-cause analysis of production incidents and mature our observability, logging, and metrics-gathering tools to improve system reliability. DevSecOps: Ensure security … infrastructure, applications, and data in a hybrid cloud environment Designing and maintaining robust CI/CD Automation pipelines Implementation of open-source standards for observability (e.g., OpenTelemetry ) Strong troubleshooting, analytical, and system-debugging skills Desired Skills We are also keen to discuss experience in: FinOps practices, including cost control, optimization ...

Platform Engineer

Hiring Organisation: TXP
Location: Telford, Shropshire, West Midlands, United Kingdom
Employment Type: Contract
Contract Rate: £500 - £525 per day

using GitLab. * Support the containerisation and deployment of applications. * Work closely with engineering teams delivering Java-based microservices. * Implement and maintain monitoring, logging and observability solutions. * Troubleshoot platform and deployment issues across multiple environments. Essential Skills * Kubernetes * Helm (strong commercial experience required) * AWS * GitLab CI/CD Pipelines * Containerisation (Docker …/Kubernetes) * Microservices architecture and deployment * Observability, monitoring and logging * Experience supporting engineering teams delivering Java applications Desirable Skills * MongoDB * Terraform * Public Sector or Government experience ...

Lead AI Platform Engineer (Contract)

Hiring Organisation: GlobalLogic
Location: London Area, United Kingdom

month assignment (inside IR35), to start 2-4 weeks. This is a handson, high impact role at the intersection of AI governance, distributed systems, observability, and platform engineering to lead technical delivery for an AI centralised platform - Control Tower. We’re looking for a Technical Lead to drive the endtoend … Java, and modern data processing frameworks. Expertise in cloud-based AI/ML ecosystems, particularly AWS SageMaker (required). Proven experience developing monitoring frameworks, observability pipelines, and dashboards. Deep understanding of event-driven architectures and messaging systems (Kafka, Vert.x, or similar). Knowledge of security engineering, IAM principles, encryption ...

Azure DevOps Engineer

Hiring Organisation: Langham Recruitment
Location: Birmingham, West Midlands (County), United Kingdom
Employment Type: Contract
Contract Rate: £500 - £550/day Remote, Outside IR35

DevOps. Implement Infrastructure as Code using Terraform and automation tooling. Support the migration and modernisation of existing applications into Azure. Improve monitoring, logging and observability across environments. Collaborate with development teams to streamline deployment processes. Troubleshoot infrastructure, deployment and performance issues. Ensure environments adhere to security, resilience and disaster recovery … Python. Exposure to C#, .NET or ASP.NET environments. Experience migrating applications and services from on-premise environments into Azure. Familiarity with monitoring, logging and observability tools. Strong understanding of cloud security and governance principles. Contract Details Azure DevOps Engineer £500-£550 per day Outside IR35 Initial 3-month contract Remote ...

Cloud Engineer

Hiring Organisation: ea Change
Location: Reading, England, United Kingdom

operate clusters across multiple environments Build out backup, snapshot and recovery processes Maintain and improve CI/CD pipelines Set up monitoring, alerting and observability Take ownership of reliability and uptime as the platform grows Essential Experience with Kubernetes - managing clusters day to day Experience with Terraform - writing and maintaining … Experience building and maintaining CI/CD pipelines (GitHub Actions, GitLab CI, Jenkins or similar) Solid Linux and networking fundamentals An operational mindset - reliability, observability and automation, not just deployment Desirable Good understanding of cloud security and access management Experience with monitoring and alerting tools (Prometheus, Grafana, Datadog, CloudWatch ...

Site Reliability Engineer

Hiring Organisation: Huxley Associates
Location: City of London, London, United Kingdom
Employment Type: Permanent
Salary: £90000/annum + Bonus & Benefits Package

scalability, and operational excellence across a complex, regulated environment. Key Responsibilities Lead the implementation of SRE best practices across cloud infrastructure Drive improvements in observability, alerting, and capacity planning (SLA/SLO/SLI) Identify and reduce operational toil through automation and remediation frameworks Build and enhance GitOps and Infrastructure … cloud environments (AWS/GCP) Strong scripting skills (Python, Ansible, or PowerShell) Experience with Infrastructure as Code and GitOps methodologies Hands-on knowledge of observability/APM tools (e.g. Grafana, Datadog, Dynatrace) Proven experience managing incidents, root cause analysis, and on-call support Understanding of SLA/SLO/ ...

Principal Cloud Architect

Hiring Organisation: TXP
Location: Southampton, Hampshire, South East, United Kingdom
Employment Type: Contract
Contract Rate: £550 - £600 per day

delivery teams. The successful candidate will provide manager-level technical leadership across DevOps, cloud platforms, Infrastructure as Code, CI/CD, networking, security, observability and reliability engineering. They will help shape enterprise-scale transformation, hybrid cloud strategy and platform services aligned to the Azure Well-Architected Framework, ensuring solutions … compute/storage design. Evaluate platform changes including major provider upgrades (AzureRM/Cloudflare), DR and high availability improvements, cost optimisation strategies, and observability frameworks. Lead technical designs for large-scale refactoring and provider upgrades, environment creation pipelines, secure container registry access, identity integration and Zero Trust patterns, and event ...

SRE DevOps Engineer

Hiring Organisation: WTW
Location: Surrey, United Kingdom
Employment Type: Full Time

product team to develop and support operationally resilient cloud infrastructure. The ideal candidate will have a track record in Microsoft Azure and Observability platforms in complex SaaS environments and have excellent communication skills. You will be joining our growing engineering organization building a wide range of market-leading InsurTech solutions … with focus on high cadence and cost effectiveness Implement infrastructure as code Support the team in infrastructure and networking related issues Maintain and configure observability platforms such as Datadog Proactively monitor production and other environments to ensure stability, availability, security and integrity Participate in incident response, troubleshooting, and root cause ...

Principal Site Reliability Engineer

Hiring Organisation: F5 consultants
Location: Reading, Berkshire, South East, United Kingdom
Employment Type: Permanent, Work From Home
Salary: £95,000

improve platform reliability across complex Kubernetes and OpenShift environments. You'll work within a modern cloud-native environment leveraging Kubernetes, OpenShift, GitOps, service mesh, observability tooling, and automation-first engineering practices. This is a technically hands-on role where you'll take a leading voice in platform stability, mentor others … Kubernetes and OpenShift (non-negotiable) Experience working in complex multi-cloud or hybrid environments Proficiency in service mesh technologies such as Istio Experience with observability stacks including Prometheus, Grafana, Loki, and Tempo Strong Infrastructure as Code experience using Kustomize or Helm, with scripting skills in Bash and/or Python ...

Forward Deployed AI Engineer

Hiring Organisation: WTW
Location: Greater London, United Kingdom
Employment Type: Full Time

enabled systems. You’ll bring deep expertise across modern full-stack technologies (.NET, Azure, SQL, React/Angular), along with experience in distributed systems, observability, and AI tooling such as LLMs, retrieval pipelines, and agentic workflows. Acting as a bridge between business and technology, you’ll work across product, data … orchestration, evaluation loops, and human-in-the-loop controls. Enterprise integration: Integrate AI solutions with enterprise systems, APIs, data platforms, document repositories, workflow tools, observability platforms, and identity and access management services. Production engineering: Ensure AI solutions meet enterprise standards for reliability, scalability, latency, maintainability, cost control, logging, monitoring ...

Software Engineering Manager

Hiring Organisation: 17918
Location: London, United Kingdom

best practice, reduce duplication, and promote maintainable, secure and performant systems. Enhance delivery capability through platform reliability and DevOps maturity - Continuously improve deployment pipelines, observability, alerting, incident handling, recovery procedures and operational readiness across Field Ops engineering teams. Manage stakeholders and ensure transparent communications - Build strong relationships across product, operations … decisions Funding for technical enablers Field Ops workflow design and data requirements Use of Data/Insight/Automation Uses engineering metrics, performance insights, observability data and AI[1]assisted diagnostics to guide decisions. Ensures human judgement remains central. Constraints Centrica architectural principles, engineering guardrails, data privacy/security policies ...

Software Engineering Manager

Hiring Organisation: Centrica - CHP
Location: Leicester, Leicestershire, East Midlands, United Kingdom
Employment Type: Permanent

Senior Site Reliability Engineer

Hiring Organisation: Experian Ltd
Location: Nottingham, Nottinghamshire, East Midlands, United Kingdom
Employment Type: Permanent, Work From Home

Perform detailed post-incident investigations to identify underlying causes. Document findings and share learnings to prevent recurrence. Implement preventive measures and continuous improvement processes. Observability Champion monitoring, logging, and alerting strategies using tools like Prometheus, Grafana, ELK, and AWS CloudWatch. Build real-time dashboards to visualize system health and reliability … culture of shared responsibility for uptime and performance across engineering teams. Qualifications Deep expertise with various AWS services. Advanced knowledge of monitoring and observability tools. Strong leadership capabilities with a focus on setting clear direction, aligning team efforts with organizational goals, and maintaining high levels of motivation and engagement across ...

Software Engineering Manager - Tooling and Optimisations

Hiring Organisation: Centrica - CHP
Location: Windsor, Berkshire, South East, United Kingdom
Employment Type: Permanent

practice, reduce duplication, and support maintainable, secure and high-performing systems. Improve delivery capability through platform reliability and DevOps maturity Continuously strengthen deployment pipelines, observability, alerting, incident response, recovery procedures and operational readiness across Field Ops engineering teams. Manage stakeholders and maintain clear communication Build trusted relationships across product, operations … data modelling and data quality controls. Ability to produce both high-level and detailed design specifications. Experience leading DevOps practices, including CI/CD, observability, monitoring and incident management. Demonstrated capability leading multi-squad engineering delivery in a product-led organisation. Mindset & Ways of Working Comfortable working in iterative, outcome ...

AI Engineer

Hiring Organisation: Elsevier
Location: Greater London, United Kingdom
Employment Type: Full Time

within a defined problem, building and testing tool use, retrieval pipelines and agent workflows, integrating AI capabilities into enterprise systems, and contributing to evaluation, observability and guardrails. You will hold a high bar on code quality, flag risks and blockers early, and work alongside host-function stakeholders to make sure … agentic AI solutions to production standard within a defined technical approach. Implement and test tool use, retrieval pipelines, and agent workflows. Contribute to evaluation, observability and guardrails for agentic systems. Integrate AI capabilities into existing enterprise workflows and systems. Maintain high code quality and documentation so patterns can be reused. ...

QA Test Infrastructure Engineer

Hiring Organisation: Talent Locker
Location: Cheltenham, Gloucestershire, South West, United Kingdom
Employment Type: Contract

QA Test Infrastructure Engineer - Tauton, Onsite - Outside IR35 - Highest Security Clearance As a QA Test Infrastructure Engineer, you'll help design, build, and deliver secure digital solutions in highly secure environments. You'll work alongside ...

SRE Managing Consultant - Cloud Operating Model

Hiring Organisation: Capgemini
Location: Manchester, United Kingdom
Employment Type: Full Time

Budgets : Establish service measures and targets (SLIs/SLOs) and introduce Error Budgets to enable data-driven trade-offs between reliability and delivery velocity. Observability & Operational Insight: Shape observability approaches (metrics/logs/traces) and operational monitoring models that make reliability risks visible and actionable, improving operational decision-making. … large‐scale delivery contexts; associate‐level certifications are desirable but not mandatory. Design, establish, and evolve SRE‐led centres of excellence (e.g. Reliability, Observability, or Operational Excellence), setting enterprise‐level standards for SLIs/SLOs, incident management, observability, and continuous improvement across cloud and hybrid platforms. Exposure to modern observability ...

DevOps Engineer

Hiring Organisation: Oscar Associates (UK) Limited
Location: Manchester, North West, United Kingdom
Employment Type: Permanent
Salary: £70,000

scalable, reliable and cost-efficient as it moves into full production. Working closely with engineering teams, you'll drive automation, improve deployment pipelines, strengthen observability and ensure the platform performs under high-volume, real-time workloads. This is a hands-on position with genuine ownership and plenty of opportunity … enhancing CI/CD pipelines with blue/green deployments and automated rollback Driving platform reliability, resilience and scalability Developing monitoring, alerting and observability across the environment Managing cloud costs and implementing best FinOps practices Participating in a small production on-call rota Technology AWS ECS Fargate Terraform Aurora ...

Senior DevOps Engineer

Hiring Organisation: Method Resourcing
Location: Newcastle Upon Tyne, England, United Kingdom

evolving cloud-native infrastructure Building and improving Infrastructure as Code solutions Driving automation across deployment pipelines and operational processes Improving platform reliability, resilience and observability Implementing monitoring, alerting and operational best practice Contributing to cloud strategy and technical roadmaps Working closely with engineering teams to improve delivery and platform performance … Bicep or similar) CI/CD pipeline design and optimisation Scripting and automation (PowerShell, Bash or Python) Containers and cloud-native technologies Monitoring and observability tooling DevOps engineering best practices Secure, highly available cloud environments Experience within regulated industries such as financial services, insurance or utilities would be beneficial ...

Data Engineer-Must have strong GCP experience-Inside IR35

Hiring Organisation: Reed Technology
Location: London, United Kingdom
Employment Type: Temporary
Salary: £425/day POSSIBLY NEGOTIABLE

Standardise ingestion and transformation using configuration-driven frameworks Embed data quality checks by default (schema validation, completeness, freshness, thresholds, alerting) Improve pipeline resilience, monitoring, observability and recovery mechanisms Integrate AI/ML capabilities where appropriate (e.g. anomaly detection, intelligent monitoring) Support delivery of a wider Data Strategy programme , improving consistency … Cloud Run/App Engine Experience with CI/CD, automated testing and infrastructure as code Data Quality & Monitoring Experience implementing data quality frameworks, observability tooling and monitoring solutions Preferred Experience Building reusable pipeline frameworks for large, multi-domain platforms Delivery within enterprise data transformation programmes with strong SLAs Exposure ...

Site Reliability Engineer's

Hiring Organisation: F5 consultants
Location: Reading, Berkshire, South East, United Kingdom
Employment Type: Permanent, Work From Home
Salary: £70,000

support, shared ownership, and continuous improvement. You'll work hands-on in a modern cloud-native environment leveraging Kubernetes, OpenShift, GitOps, service mesh, and observability tooling There is genuine investment in your development through training, certifications, and the expertise of those around you. You'll also be part … Ability to work within complex multi-cloud or hybrid environments with a solid foundation in distributed systems Expertise in observability tooling such as Prometheus, Grafana, Loki, and Tempo Proficiency in IaC tools such as Kustomize and Helm, with scripting skills in Bash/Python Experience managing GitOps pipelines using Tekton ...

Site Reliability Engineer (SRE) - Cloud & Automation

Hiring Organisation: Spencer Rose Ltd
Location: London, United Kingdom
Employment Type: Permanent
Salary: GBP 60,000 - 70,000 Annual

implementation of SRE practices across the organisation, working closely with infrastructure teams to optimise deployment processes and embed automation and operational excellence. Enhance observability and reliability , defining and implementing SLAs, SLOs and SLIs to improve alerting, monitoring, and capacity planning. Identify and eliminate toil , developing frameworks to analyse recurring issues … beneficial). Experience supporting and building multi-environment, multi-region cloud platforms (AWS or GCP), using IaC and GitOps workflows. Hands-on experience with observability/APM tooling such as Grafana, Datadog or Dynatrace. Background working in regulated financial services or banking environments. Excellent troubleshooting, analytical and communication skills, able ...

Vice President, DevOps Production Services

Hiring Organisation: Jobleads-UK
Location: Manchester, England, United Kingdom

enterprise applications and ensure platform stability, resiliency, and availability. Monitor application health, system performance, batch jobs, interfaces, and alerts using enterprise monitoring and observability tools. Investigate, troubleshoot, and resolve production incidents within defined SLAs. Perform root cause analysis (RCA) for recurring issues and drive permanent fixes. Analyze production logs, identify … Cloud experience preferred. Knowledge of automation/scripting using Python, Shell, or PowerShell. Exposure to DevOps/SRE practices, CI/CD pipelines, and observability tooling. Strong communication skills with the ability to provide concise incident and executive status updates. #J-18808-Ljbffr ...

Site Reliability Engineer

Hiring Organisation: Connells Limited
Location: Milton Keynes, Buckinghamshire, South East, United Kingdom
Employment Type: Permanent, Work From Home

hands-on role in ensuring it is reliable, scalable, and observable. You will help establish and mature SRE practices, focusing on: Monitoring and observability Incident response Post-incident review Reliability testing and capacity planning Toil reduction Enabling development velocity We offer a hybrid working arrangement with one day per week … Build dashboards, alerts, and runbooks to improve visibility Automate repetitive tasks to reduce operational toil Collaborate with cross-functional teams to enhance reliability and observability Support performance testing and capacity planning Proactively identify and prioritise reliability improvements Experience & Skills Required: Hands-on experience with Azure Monitoring (Application Insights, Alerts, Action ...

Platform Engineer

Hiring Organisation: Axon Labs
Location: London Area, United Kingdom

Reliable systems for live trading, multi-venue market data ingestion, and research compute Deployment pipelines that ship strategy and model changes quickly and safely Observability across data quality, execution, strategy, and infrastructure Resilience: failover, disaster recovery, and operational readiness for systems that lose money when they’re down The path … cost, and you have supported researchers or traders. Required Skills Deep Linux and networking fundamentals Strong cloud experience, ideally AWS (compute, networking, IAM, storage, observability) Strong Python; C++, Rust, or Go for latency-critical paths is a plus Container orchestration with Kubernetes (or equivalent) Infrastructure as code (Terraform or equivalent ...