826 to 850 of 1,278 Observability Jobs

Client Service Delivery

Hiring Organisation
Accenture
Location
Birmingham, England, United Kingdom
Service Delivery Management Own full lifecycle service delivery across infrastructure and cloud environments, ensuring alignment to SLAs, KPIs, scope, and cost. Leverage AIOps and observability tools (e.g. Dynatrace, Datadog, New Relic, Elastic) to proactively monitor service health and performance. Utilise predictive alerting and anomaly detection to prevent incidents and optimise … infrastructure and cloud environments Strong understanding of IT Managed Services frameworks Hands-on experience with AIOps tools such as Dynatrace and ServiceNow Familiarity with observability tools (e.g. Datadog, New Relic, Elastic) Knowledge of event analytics tools such as Splunk IT Service Intelligence and Moogsoft Experience in stakeholder and client management ...

Lead Engineer - Kubernetes & CNI

Hiring Organisation
Jobleads-UK
Location
Greater London, England, United Kingdom
networking, cluster usage, and deployment patterns.Operational Excellence* Support troubleshooting and root-cause analysis across multilayered network and container environments.* Contribute to platform reliability and observability improvements through dashboards, run-books, and automation.* Evaluate new tools, technologies, and approaches and present findings to the wider engineering community.Ways of Working* Operate effectively … networking internals.* Working knowledge of Kubernetes deployment pipelines, Git-based workflows, and automated testing principles.* Strong recent Linux (especially Ubuntu) experience.* Exposure to monitoring, observability, and troubleshooting across containerised workloads.* Familiarity with CI/CD tools (GitLab, Jira, Confluence).* Experience with automation tools such as Terraform, Ansible, or scripting ...

Data Architect / Data Engineers

Hiring Organisation
Vaco LLC
Location
Cincinnati, Ohio, United States
Employment Type
Permanent
Salary
USD 100 Annual
data and analytics assets Data Engineering Design and implement metadata-driven batch ingestion frameworks Develop up to 10 production-grade data pipelines Implement monitoring, observability, and remediation processes Design dimensional data models optimized for analytics and reporting Modern Cloud Data Platforms Architect solutions using Azure data services (Microsoft Fabric, Synapse … Power BI report or dashboard to validate the solution Document reporting and analytics processes DataOps & MLOps Enablement Design CI/CD, testing, and observability frameworks across data pipelines Promote data quality, lineage, and reproducibility through modern DevOps practices Support AI-ready data architectures where applicable Enablement, Collaboration & Leadership Lead technical ...

QA Test Infrastructure Engineer

Hiring Organisation
Talent Locker
Location
Cheltenham, Gloucestershire, South West, United Kingdom
Employment Type
Contract
QA Test Infrastructure Engineer - Tauton, Onsite - Outside IR35 - Highest Security Clearance As a QA Test Infrastructure Engineer, you'll help design, build, and deliver secure digital solutions in highly secure environments. You'll work alongside ...

Network Monitoring & Observability Engineer - Fully remote

Hiring Organisation
Akkodis
Location
Derby, Derbyshire, United Kingdom
Employment Type
Contract
Contract Rate
£70000 - £75000/annum
successful delivery of this initiative. This is a hands-on engineering role where you'll be responsible for designing, implementing, and commissioning monitoring and observability solutions across newly deployed fibre infrastructure and network equipment. Working closely with Network Operations and Core Network teams, you'll ensure full visibility of critical … services from day one through modern monitoring technologies, streaming telemetry, and AI-driven analytics. Key Responsibilities Monitoring & Observability Design and implement end-to-end monitoring solutions across new fibre infrastructure deployments. Build and maintain streaming telemetry pipelines to provide real-time network visibility. Configure, optimise, and manage VictoriaMetrics environments, including ...

SRE Managing Consultant - Cloud Operating Model

Hiring Organisation
Jobleads-UK
Location
Manchester, England, United Kingdom
Budgets : Establish service measures and targets (SLIs/SLOs) and introduce Error Budgets to enable data‐driven trade‐offs between reliability and delivery velocity. Observability & Operational Insight: Shape observability approaches (metrics/logs/traces) and operational monitoring models that make reliability risks visible and actionable, improving operational decision‐making. … large‐scale delivery contexts; associate‐level certifications are desirable but not mandatory. Design, establish, and evolve SRE‐led centres of excellence (e.g. Reliability, Observability, or Operational Excellence), setting enterprise‐level standards for SLIs/SLOs, incident management, observability, and continuous improvement across cloud and hybrid platforms. Exposure to modern observability ...

Principal Engineer - Member Experience Platform

Hiring Organisation
Jobleads-UK
Location
Skipton, England, United Kingdom
Quality), and bar‐raising across squads: you shorten lead times, increase deployment frequency, hold change‐failure rate low, and improve MTTR through release‐linked observability - turning fast, safe flow into the default way of working.Operating at platform scale, you define cross‐cutting architecture and delivery standards (API/event contracts … resilience, observability, language/dependency baselines) and drive adoption through the Golden Path: policy‐as‐code CI/CD, progressive delivery (feature flags, canary/blue‐green), automated rollback/forward‐fix, ephemeral, data‐ready environments, and guardrails that make security and compliance by design. You partner with Platform Ownership ...

Database Reliability Engineer

Hiring Organisation
Jobleads-UK
Location
Manchester, England, United Kingdom
Cross-Cloud Portability: Use CNPG and cloud-native patterns to keep our database layer provider-agnostic, enabling seamless deployment across AWS and GCP. Evolve Observability & Monitoring: Build deep, proactive monitoring and alerting for our global database fleet to detect performance regressions and health issues before they impact our customers. Support … orchestration. Distributed Systems Enthusiast: Excited by challenges of multi‐tenant, multi‐region, and multi‐cloud environments while ensuring data integrity and mobility. Security & Observability Mindset: Prioritize security and build deep observability (Prometheus/Grafana/OpenTelemetry/Humio) with automated guardrails. Engineering via Code: Primarily deliver via code; use Java ...

Senior DevOps Engineer

Hiring Organisation
Stealth IT Consulting Limited
Location
Telford, Shropshire, West Midlands, United Kingdom
Employment Type
Contract
Contract Rate
£580 per day Inside IR35
Observability Engineer (SC Eligible) Rate: £580/day Inside IR35 Duration: 6 months Location: Mostly remote (Telford occasional onsite - 2 days/month) Clearance: SC Eligible Role Overview We are seeking an experienced Observability Engineer to design, implement, and support enterprise-grade monitoring and observability solutions across complex technology environments. … role focuses on improving service visibility, performance insight, and proactive incident detection. Key Responsibilities Design and implement end-to-end observability solutions across enterprise platforms Translate NFRs and monitoring requirements into Dynatrace configurations Deliver APM, log analytics, synthetic monitoring, and infrastructure observability Build and maintain dashboards, alerts, and performance visualisations ...

DevOps Engineer

Hiring Organisation
Infinity Quest
Location
Halifax, England, United Kingdom
Actions, Harness, Jenkins). • Networking & Security: Experience with GCP Cloud Armor, GCP Networking, and embedding secure-by-design controls from design to runtime. • Automation & Observability: Implementing actionable observability, performance tuning, and automation to reduce toil. Defining and operating against SLOs/SLIs. • Scripting & Tooling: Scripting in Bash, PowerShell, or Python. … Performance & Reliability: Define, monitor, and operate against service level objectives (SLOs/SLIs), ensuring high availability, performance, and fault tolerance. • Continuous Improvement: Drive automation, observability, and performance tuning to reduce manual effort and improve platform reliability. • Collaboration: Work closely with architecture and feature teams to evolve the cloud roadmap ...

OpenShift Administrator

Hiring Organisation
Coltech
Location
Sheffield, England, United Kingdom
hybrid cloud environments Monitor, troubleshoot, and optimise OpenShift workloads for performance and reliability Manage OpenShift networking, storage, security, patching, and upgrades Implement monitoring and observability solutions using tools such as Prometheus, Grafana, or ELK Stack Analyse system performance and recommend improvements for efficiency and cost optimisation Collaborate with DevOps, development … complex enterprise environments Desirable Skills Experience with hybrid cloud environ Exposure to automation tools such as Ansible or Terraform Knowledge of enterprise monitoring and observability platforms Experience working in large-scale regulated environments ...

Site Reliability Engineer II

Hiring Organisation
Jobleads-UK
Location
United Kingdom
design, build and evolve our infrastructure platform. You will develop Terraform modules, build CI/CD pipelines with GitHub Actions and deliver automation and observability improvements that keep our platform reliable, secure and easy for teams to adopt at scale. Responsibilities: Designing and developing reusable Terraform modules that enable teams … deliver reliable and repeatable infrastructure deployments Diagnosing and resolving complex infrastructure issues by identifying root causes across distributed cloud environments Developing automation and observability tooling to improve how infrastructure is operated at scale Implementing security and governance controls within modules and pipelines so teams inherit secure configurations by default Collaborating ...

Data Platform Engineer

Hiring Organisation
PRISM DIGITAL LIMITED
Location
Milton Keynes, Buckinghamshire, South East, United Kingdom
Employment Type
Permanent, Work From Home
Salary
£75,000
availability Own incident resolution, root cause analysis, and continuous improvement Collaborate with engineers and third-party providers to mature the platform Contribute to monitoring, observability, and cost optimisation strategies Support projects and business initiatives through robust platform delivery What Theyre Looking For: Microsoft Fabric experience Terraform experience Cloud platform engineering … delivery environments What Youll Work With: Microsoft Fabric Terraform (Infrastructure as Code) Azure cloud technologies SQL Server GitHub/CI/CD tooling Monitoring & observability tools Platform design patterns (scalability, resilience, cost control) Nice to Haves: GitHub Actions/CI/CD pipelines Zero Trust architecture Cloud cost monitoring & reporting ...

Lead Full Stack Engineer

Hiring Organisation
TXP
Location
Westminster, London, City of Westminster, United Kingdom
Employment Type
Contract
Contract Rate
£650 - £700/day
agents to refactor and move services at scale, including adding test coverage as part of the migration step Work with the platform team on observability, networking, and dependency management as services land in the cloud estate Required Experience Strong C# and .NET development experience (at least 8 years), including modern … cloud migrations, particularly off on-prem Windows estates Experience designing service templates or "golden paths" that other engineers adopt and build on Familiarity with observability tooling, including metrics, tracing, and structured logging across distributed services ...

Lead Full Stack Engineer

Hiring Organisation
TXP Technology x People
Location
South West London, London, England, United Kingdom
Employment Type
Contractor
Contract Rate
£650 - £700 per day
agents to refactor and move services at scale, including adding test coverage as part of the migration step Work with the platform team on observability, networking, and dependency management as services land in the cloud estate Required Experience Strong C# and .NET development experience (at least 8 years), including modern … cloud migrations, particularly off on-prem Windows estates Experience designing service templates or "golden paths" that other engineers adopt and build on Familiarity with observability tooling, including metrics, tracing, and structured logging across distributed services ...

Site Reliability Engineer's

Hiring Organisation
F5 consultants
Location
Reading, Berkshire, South East, United Kingdom
Employment Type
Permanent, Work From Home
Salary
£70,000
support, shared ownership, and continuous improvement. You'll work hands-on in a modern cloud-native environment leveraging Kubernetes, OpenShift, GitOps, service mesh, and observability tooling There is genuine investment in your development through training, certifications, and the expertise of those around you. You'll also be part … Ability to work within complex multi-cloud or hybrid environments with a solid foundation in distributed systems Expertise in observability tooling such as Prometheus, Grafana, Loki, and Tempo Proficiency in IaC tools such as Kustomize and Helm, with scripting skills in Bash/Python Experience managing GitOps pipelines using Tekton ...

Infrastructure / DevOps Engineer

Hiring Organisation
rmg digital
Location
England, United Kingdom
Managing and optimising AWS services, including ECS, Lambda, VPC, and Aurora Postgres Building and maintaining CI/CD pipelines using GitHub Actions Implementing monitoring, observability, and alerting using Datadog Supporting development teams with deployment, automation, and operational best practices Improving infrastructure security, scalability, reliability, and cost-efficiency Monitoring system performance … Infrastructure as Code tools such as Terraform and/or CDK Understanding of CI/CD pipelines and GitHub Actions Familiarity with monitoring and observability tooling, such as Datadog Knowledge of containerisation concepts and infrastructure best practices Some experience with TypeScript or JavaScript for scripting and CDK purposes Strong troubleshooting ...

Site Reliability Engineer, K8s

Hiring Organisation
Jobleads-UK
Location
United Kingdom
agreed targets Lead incident response for BI platform services — triage, resolve, and follow up with post-mortems that actually prevent recurrence Build and manage observability infrastructure: dashboards, alerts, and logging across GCP services Track GCP cloud spend and set up cost alerting to flag anomalies before they become problems Review … related field, or equivalent hands‐on experience Practical experience with GCP — Cloud Run, API Gateway, and BigQuery in particular Experience with monitoring and observability tooling (Cloud Monitoring, Datadog, or similar) Solid grasp of cloud security fundamentals — IAM, network controls, access management Proficiency with Git and version control in a team ...

Senior Software Engineer - Electronic Trading Shared Services

Hiring Organisation
Jobleads-UK
Location
City Of London, England, United Kingdom
trading workflows and products Design common frameworks and APIs that unify data exchange across applications and services Drive initiatives that enhance scalability, resilience, and observability across the platform Partner with engineering and product teams across asset classes to deliver shared solutions that power new trading capabilities Gain a deeper understanding … Linux environments Experience with streaming or messaging technologies, e.g., Kafka Knowledge of service‐oriented or microservices architectures Interest in performance optimization, reliability engineering, and observability Curiosity about financial markets and how technology drives trade automation and transparency If indicated, please note that years of experience are a guide; we will ...

Vice President Software Engineering

Hiring Organisation
Jobleads-UK
Location
City of Edinburgh, Scotland, United Kingdom
where 80–90% of code is AI‐generated, with a roadmap to 95%+. Embed modern engineering excellence (CI/CD, trunk‐based development, observability, and automated testing). Partner cross‐functionally across Product, DevOps, Security, and Platform teams. Build a high‐performance culture grounded in accountability, innovation, and continuous … where software is shipped to production frequently or daily. Expertise in modern practices including CI/CD pipelines, trunk‐based development, automated testing strategies, observability and system reliability. Proven ability to use engineering metrics to drive performance and continuous improvement. Organisational Design & Methodologies Experience designing and evolving engineering organisations using ...

Principal Full Stack Engineer & Architecture Lead

Hiring Organisation
Command Recruitment
Location
London, United Kingdom
Employment Type
Permanent
Salary
£100000 - £110000/annum
technical design decisions Define scalable, secure, and maintainable engineering standards Provide technical leadership across frontend, backend, APIs, infrastructure, and integrations Drive platform scalability, resilience, observability, and performance Partner with leadership teams to align technical strategy with business goals Act as the senior technical authority for complex engineering decisions Hands … Gateway, EventBridge, SQS, Step Functions, S3, CloudWatch, RDS) Backend Node.js, TypeScript Frontend React, Next.js, Tailwind CSS Data & Architecture PostgreSQL, Serverless, Event-Driven Microservices DevOps & Observability Terraform/AWS CDK, CI/CD, Monitoring & Logging About You We are looking for a technically strong and commercially minded engineering leader with: 8+ ...

Staff AI Platform Engineer

Hiring Organisation
Invoca
Location
Chicago, Illinois, United States
Employment Type
Permanent
Salary
USD Annual
also adoption, ergonomics, standardization, and long-term leverage. You know how to create opinionated golden paths that help teams move faster without sacrificing reliability, observability, or governance. What you'll have the opportunity to do: Build the AI platform foundations: Design and maintain the core APIs, SDKs, libraries, and services … control layers that sit between applications and model providers. You will help ensure low latency, high availability, cost efficiency, and strong production behavior.Drive observability and governance: Build and enforce the platform capabilities that make AI systems measurable, debuggable, and governable in production, including tracing, auditing, policy enforcement, and operational standards.Improve ...

Staff AI Platform Engineer

Hiring Organisation
Invoca
Location
Omaha, Nebraska, United States
Employment Type
Permanent
Salary
USD Annual
also adoption, ergonomics, standardization, and long-term leverage. You know how to create opinionated golden paths that help teams move faster without sacrificing reliability, observability, or governance. What you'll have the opportunity to do: Build the AI platform foundations: Design and maintain the core APIs, SDKs, libraries, and services … control layers that sit between applications and model providers. You will help ensure low latency, high availability, cost efficiency, and strong production behavior.Drive observability and governance: Build and enforce the platform capabilities that make AI systems measurable, debuggable, and governable in production, including tracing, auditing, policy enforcement, and operational standards.Improve ...

Staff AI Platform Engineer

Hiring Organisation
Invoca
Location
Detroit, Michigan, United States
Employment Type
Permanent
Salary
USD Annual
also adoption, ergonomics, standardization, and long-term leverage. You know how to create opinionated golden paths that help teams move faster without sacrificing reliability, observability, or governance. What you'll have the opportunity to do: Build the AI platform foundations: Design and maintain the core APIs, SDKs, libraries, and services … control layers that sit between applications and model providers. You will help ensure low latency, high availability, cost efficiency, and strong production behavior.Drive observability and governance: Build and enforce the platform capabilities that make AI systems measurable, debuggable, and governable in production, including tracing, auditing, policy enforcement, and operational standards.Improve ...

Staff AI Platform Engineer

Hiring Organisation
Invoca
Location
Pittsburgh, Pennsylvania, United States
Employment Type
Permanent
Salary
USD Annual
also adoption, ergonomics, standardization, and long-term leverage. You know how to create opinionated golden paths that help teams move faster without sacrificing reliability, observability, or governance. What you'll have the opportunity to do: Build the AI platform foundations: Design and maintain the core APIs, SDKs, libraries, and services … control layers that sit between applications and model providers. You will help ensure low latency, high availability, cost efficiency, and strong production behavior.Drive observability and governance: Build and enforce the platform capabilities that make AI systems measurable, debuggable, and governable in production, including tracing, auditing, policy enforcement, and operational standards.Improve ...