826 to 850 of 1,278 Observability Jobs

Client Service Delivery

Hiring Organisation: Accenture
Location: Birmingham, England, United Kingdom

Service Delivery Management Own full lifecycle service delivery across infrastructure and cloud environments, ensuring alignment to SLAs, KPIs, scope, and cost. Leverage AIOps and observability tools (e.g. Dynatrace, Datadog, New Relic, Elastic) to proactively monitor service health and performance. Utilise predictive alerting and anomaly detection to prevent incidents and optimise … infrastructure and cloud environments Strong understanding of IT Managed Services frameworks Hands-on experience with AIOps tools such as Dynatrace and ServiceNow Familiarity with observability tools (e.g. Datadog, New Relic, Elastic) Knowledge of event analytics tools such as Splunk IT Service Intelligence and Moogsoft Experience in stakeholder and client management ...

Lead Engineer - Kubernetes & CNI

Hiring Organisation: Jobleads-UK
Location: Greater London, England, United Kingdom

networking, cluster usage, and deployment patterns.Operational Excellence* Support troubleshooting and root-cause analysis across multilayered network and container environments.* Contribute to platform reliability and observability improvements through dashboards, run-books, and automation.* Evaluate new tools, technologies, and approaches and present findings to the wider engineering community.Ways of Working* Operate effectively … networking internals.* Working knowledge of Kubernetes deployment pipelines, Git-based workflows, and automated testing principles.* Strong recent Linux (especially Ubuntu) experience.* Exposure to monitoring, observability, and troubleshooting across containerised workloads.* Familiarity with CI/CD tools (GitLab, Jira, Confluence).* Experience with automation tools such as Terraform, Ansible, or scripting ...

Data Architect / Data Engineers

Hiring Organisation: Vaco LLC
Location: Cincinnati, Ohio, United States
Employment Type: Permanent
Salary: USD 100 Annual

data and analytics assets Data Engineering Design and implement metadata-driven batch ingestion frameworks Develop up to 10 production-grade data pipelines Implement monitoring, observability, and remediation processes Design dimensional data models optimized for analytics and reporting Modern Cloud Data Platforms Architect solutions using Azure data services (Microsoft Fabric, Synapse … Power BI report or dashboard to validate the solution Document reporting and analytics processes DataOps & MLOps Enablement Design CI/CD, testing, and observability frameworks across data pipelines Promote data quality, lineage, and reproducibility through modern DevOps practices Support AI-ready data architectures where applicable Enablement, Collaboration & Leadership Lead technical ...

QA Test Infrastructure Engineer

Hiring Organisation: Talent Locker
Location: Cheltenham, Gloucestershire, South West, United Kingdom
Employment Type: Contract

QA Test Infrastructure Engineer - Tauton, Onsite - Outside IR35 - Highest Security Clearance As a QA Test Infrastructure Engineer, you'll help design, build, and deliver secure digital solutions in highly secure environments. You'll work alongside ...

Network Monitoring & Observability Engineer - Fully remote

Hiring Organisation: Akkodis
Location: Derby, Derbyshire, United Kingdom
Employment Type: Contract
Contract Rate: £70000 - £75000/annum

successful delivery of this initiative. This is a hands-on engineering role where you'll be responsible for designing, implementing, and commissioning monitoring and observability solutions across newly deployed fibre infrastructure and network equipment. Working closely with Network Operations and Core Network teams, you'll ensure full visibility of critical … services from day one through modern monitoring technologies, streaming telemetry, and AI-driven analytics. Key Responsibilities Monitoring & Observability Design and implement end-to-end monitoring solutions across new fibre infrastructure deployments. Build and maintain streaming telemetry pipelines to provide real-time network visibility. Configure, optimise, and manage VictoriaMetrics environments, including ...

SRE Managing Consultant - Cloud Operating Model

Hiring Organisation: Jobleads-UK
Location: Manchester, England, United Kingdom

Budgets : Establish service measures and targets (SLIs/SLOs) and introduce Error Budgets to enable data‐driven trade‐offs between reliability and delivery velocity. Observability & Operational Insight: Shape observability approaches (metrics/logs/traces) and operational monitoring models that make reliability risks visible and actionable, improving operational decision‐making. … large‐scale delivery contexts; associate‐level certifications are desirable but not mandatory. Design, establish, and evolve SRE‐led centres of excellence (e.g. Reliability, Observability, or Operational Excellence), setting enterprise‐level standards for SLIs/SLOs, incident management, observability, and continuous improvement across cloud and hybrid platforms. Exposure to modern observability ...

Principal Engineer - Member Experience Platform

Hiring Organisation: Jobleads-UK
Location: Skipton, England, United Kingdom

Quality), and bar‐raising across squads: you shorten lead times, increase deployment frequency, hold change‐failure rate low, and improve MTTR through release‐linked observability - turning fast, safe flow into the default way of working.Operating at platform scale, you define cross‐cutting architecture and delivery standards (API/event contracts … resilience, observability, language/dependency baselines) and drive adoption through the Golden Path: policy‐as‐code CI/CD, progressive delivery (feature flags, canary/blue‐green), automated rollback/forward‐fix, ephemeral, data‐ready environments, and guardrails that make security and compliance by design. You partner with Platform Ownership ...

Database Reliability Engineer

Hiring Organisation: Jobleads-UK
Location: Manchester, England, United Kingdom

Cross-Cloud Portability: Use CNPG and cloud-native patterns to keep our database layer provider-agnostic, enabling seamless deployment across AWS and GCP. Evolve Observability & Monitoring: Build deep, proactive monitoring and alerting for our global database fleet to detect performance regressions and health issues before they impact our customers. Support … orchestration. Distributed Systems Enthusiast: Excited by challenges of multi‐tenant, multi‐region, and multi‐cloud environments while ensuring data integrity and mobility. Security & Observability Mindset: Prioritize security and build deep observability (Prometheus/Grafana/OpenTelemetry/Humio) with automated guardrails. Engineering via Code: Primarily deliver via code; use Java ...

Senior DevOps Engineer

Hiring Organisation: Stealth IT Consulting Limited
Location: Telford, Shropshire, West Midlands, United Kingdom
Employment Type: Contract
Contract Rate: £580 per day Inside IR35

Observability Engineer (SC Eligible) Rate: £580/day Inside IR35 Duration: 6 months Location: Mostly remote (Telford occasional onsite - 2 days/month) Clearance: SC Eligible Role Overview We are seeking an experienced Observability Engineer to design, implement, and support enterprise-grade monitoring and observability solutions across complex technology environments. … role focuses on improving service visibility, performance insight, and proactive incident detection. Key Responsibilities Design and implement end-to-end observability solutions across enterprise platforms Translate NFRs and monitoring requirements into Dynatrace configurations Deliver APM, log analytics, synthetic monitoring, and infrastructure observability Build and maintain dashboards, alerts, and performance visualisations ...

DevOps Engineer

Hiring Organisation: Infinity Quest
Location: Halifax, England, United Kingdom

Actions, Harness, Jenkins). • Networking & Security: Experience with GCP Cloud Armor, GCP Networking, and embedding secure-by-design controls from design to runtime. • Automation & Observability: Implementing actionable observability, performance tuning, and automation to reduce toil. Defining and operating against SLOs/SLIs. • Scripting & Tooling: Scripting in Bash, PowerShell, or Python. … Performance & Reliability: Define, monitor, and operate against service level objectives (SLOs/SLIs), ensuring high availability, performance, and fault tolerance. • Continuous Improvement: Drive automation, observability, and performance tuning to reduce manual effort and improve platform reliability. • Collaboration: Work closely with architecture and feature teams to evolve the cloud roadmap ...

OpenShift Administrator

Hiring Organisation: Coltech
Location: Sheffield, England, United Kingdom

hybrid cloud environments Monitor, troubleshoot, and optimise OpenShift workloads for performance and reliability Manage OpenShift networking, storage, security, patching, and upgrades Implement monitoring and observability solutions using tools such as Prometheus, Grafana, or ELK Stack Analyse system performance and recommend improvements for efficiency and cost optimisation Collaborate with DevOps, development … complex enterprise environments Desirable Skills Experience with hybrid cloud environ Exposure to automation tools such as Ansible or Terraform Knowledge of enterprise monitoring and observability platforms Experience working in large-scale regulated environments ...

Site Reliability Engineer II

Hiring Organisation: Jobleads-UK
Location: United Kingdom

design, build and evolve our infrastructure platform. You will develop Terraform modules, build CI/CD pipelines with GitHub Actions and deliver automation and observability improvements that keep our platform reliable, secure and easy for teams to adopt at scale. Responsibilities: Designing and developing reusable Terraform modules that enable teams … deliver reliable and repeatable infrastructure deployments Diagnosing and resolving complex infrastructure issues by identifying root causes across distributed cloud environments Developing automation and observability tooling to improve how infrastructure is operated at scale Implementing security and governance controls within modules and pipelines so teams inherit secure configurations by default Collaborating ...

Data Platform Engineer

Hiring Organisation: PRISM DIGITAL LIMITED
Location: Milton Keynes, Buckinghamshire, South East, United Kingdom
Employment Type: Permanent, Work From Home
Salary: £75,000

availability Own incident resolution, root cause analysis, and continuous improvement Collaborate with engineers and third-party providers to mature the platform Contribute to monitoring, observability, and cost optimisation strategies Support projects and business initiatives through robust platform delivery What Theyre Looking For: Microsoft Fabric experience Terraform experience Cloud platform engineering … delivery environments What Youll Work With: Microsoft Fabric Terraform (Infrastructure as Code) Azure cloud technologies SQL Server GitHub/CI/CD tooling Monitoring & observability tools Platform design patterns (scalability, resilience, cost control) Nice to Haves: GitHub Actions/CI/CD pipelines Zero Trust architecture Cloud cost monitoring & reporting ...

Lead Full Stack Engineer

Hiring Organisation: TXP
Location: Westminster, London, City of Westminster, United Kingdom
Employment Type: Contract
Contract Rate: £650 - £700/day

agents to refactor and move services at scale, including adding test coverage as part of the migration step Work with the platform team on observability, networking, and dependency management as services land in the cloud estate Required Experience Strong C# and .NET development experience (at least 8 years), including modern … cloud migrations, particularly off on-prem Windows estates Experience designing service templates or "golden paths" that other engineers adopt and build on Familiarity with observability tooling, including metrics, tracing, and structured logging across distributed services ...

Lead Full Stack Engineer

Hiring Organisation: TXP Technology x People
Location: South West London, London, England, United Kingdom
Employment Type: Contractor
Contract Rate: £650 - £700 per day

Site Reliability Engineer's

Hiring Organisation: F5 consultants
Location: Reading, Berkshire, South East, United Kingdom
Employment Type: Permanent, Work From Home
Salary: £70,000

support, shared ownership, and continuous improvement. You'll work hands-on in a modern cloud-native environment leveraging Kubernetes, OpenShift, GitOps, service mesh, and observability tooling There is genuine investment in your development through training, certifications, and the expertise of those around you. You'll also be part … Ability to work within complex multi-cloud or hybrid environments with a solid foundation in distributed systems Expertise in observability tooling such as Prometheus, Grafana, Loki, and Tempo Proficiency in IaC tools such as Kustomize and Helm, with scripting skills in Bash/Python Experience managing GitOps pipelines using Tekton ...

Infrastructure / DevOps Engineer

Hiring Organisation: rmg digital
Location: England, United Kingdom

Managing and optimising AWS services, including ECS, Lambda, VPC, and Aurora Postgres Building and maintaining CI/CD pipelines using GitHub Actions Implementing monitoring, observability, and alerting using Datadog Supporting development teams with deployment, automation, and operational best practices Improving infrastructure security, scalability, reliability, and cost-efficiency Monitoring system performance … Infrastructure as Code tools such as Terraform and/or CDK Understanding of CI/CD pipelines and GitHub Actions Familiarity with monitoring and observability tooling, such as Datadog Knowledge of containerisation concepts and infrastructure best practices Some experience with TypeScript or JavaScript for scripting and CDK purposes Strong troubleshooting ...

Site Reliability Engineer, K8s

Hiring Organisation: Jobleads-UK
Location: United Kingdom

agreed targets Lead incident response for BI platform services — triage, resolve, and follow up with post-mortems that actually prevent recurrence Build and manage observability infrastructure: dashboards, alerts, and logging across GCP services Track GCP cloud spend and set up cost alerting to flag anomalies before they become problems Review … related field, or equivalent hands‐on experience Practical experience with GCP — Cloud Run, API Gateway, and BigQuery in particular Experience with monitoring and observability tooling (Cloud Monitoring, Datadog, or similar) Solid grasp of cloud security fundamentals — IAM, network controls, access management Proficiency with Git and version control in a team ...

Senior Software Engineer - Electronic Trading Shared Services

Hiring Organisation: Jobleads-UK
Location: City Of London, England, United Kingdom

trading workflows and products Design common frameworks and APIs that unify data exchange across applications and services Drive initiatives that enhance scalability, resilience, and observability across the platform Partner with engineering and product teams across asset classes to deliver shared solutions that power new trading capabilities Gain a deeper understanding … Linux environments Experience with streaming or messaging technologies, e.g., Kafka Knowledge of service‐oriented or microservices architectures Interest in performance optimization, reliability engineering, and observability Curiosity about financial markets and how technology drives trade automation and transparency If indicated, please note that years of experience are a guide; we will ...

Vice President Software Engineering

Hiring Organisation: Jobleads-UK
Location: City of Edinburgh, Scotland, United Kingdom

where 80–90% of code is AI‐generated, with a roadmap to 95%+. Embed modern engineering excellence (CI/CD, trunk‐based development, observability, and automated testing). Partner cross‐functionally across Product, DevOps, Security, and Platform teams. Build a high‐performance culture grounded in accountability, innovation, and continuous … where software is shipped to production frequently or daily. Expertise in modern practices including CI/CD pipelines, trunk‐based development, automated testing strategies, observability and system reliability. Proven ability to use engineering metrics to drive performance and continuous improvement. Organisational Design & Methodologies Experience designing and evolving engineering organisations using ...

Principal Full Stack Engineer & Architecture Lead

Hiring Organisation: Command Recruitment
Location: London, United Kingdom
Employment Type: Permanent
Salary: £100000 - £110000/annum

technical design decisions Define scalable, secure, and maintainable engineering standards Provide technical leadership across frontend, backend, APIs, infrastructure, and integrations Drive platform scalability, resilience, observability, and performance Partner with leadership teams to align technical strategy with business goals Act as the senior technical authority for complex engineering decisions Hands … Gateway, EventBridge, SQS, Step Functions, S3, CloudWatch, RDS) Backend Node.js, TypeScript Frontend React, Next.js, Tailwind CSS Data & Architecture PostgreSQL, Serverless, Event-Driven Microservices DevOps & Observability Terraform/AWS CDK, CI/CD, Monitoring & Logging About You We are looking for a technically strong and commercially minded engineering leader with: 8+ ...

Staff AI Platform Engineer

Hiring Organisation: Invoca
Location: Chicago, Illinois, United States
Employment Type: Permanent
Salary: USD Annual

also adoption, ergonomics, standardization, and long-term leverage. You know how to create opinionated golden paths that help teams move faster without sacrificing reliability, observability, or governance. What you'll have the opportunity to do: Build the AI platform foundations: Design and maintain the core APIs, SDKs, libraries, and services … control layers that sit between applications and model providers. You will help ensure low latency, high availability, cost efficiency, and strong production behavior.Drive observability and governance: Build and enforce the platform capabilities that make AI systems measurable, debuggable, and governable in production, including tracing, auditing, policy enforcement, and operational standards.Improve ...

Staff AI Platform Engineer

Hiring Organisation: Invoca
Location: Omaha, Nebraska, United States
Employment Type: Permanent
Salary: USD Annual

Staff AI Platform Engineer

Hiring Organisation: Invoca
Location: Detroit, Michigan, United States
Employment Type: Permanent
Salary: USD Annual

Staff AI Platform Engineer

Hiring Organisation: Invoca
Location: Pittsburgh, Pennsylvania, United States
Employment Type: Permanent
Salary: USD Annual