Remote 'Observability' Job Vacancies

601 to 625 of 692 Remote Observability Jobs

Cloud Platform Lead

United Kingdom
Hybrid / WFH Options
Tenth Revolution Group
as-Code using AWS CDK and TypeScript Oversee security, scalability, and cost optimisation of cloud environments Collaborate with product and engineering teams to align platform priorities Define and execute observability strategies including monitoring, logging, and alerting Design and maintain CI/CD pipelines and containerised deployments Key Requirements Proven experience designing, deploying, and managing AWS infrastructure for SaaS platforms Hands … TypeScript for infrastructure-as-code Track record of leading a team of 2–5 engineers in a scale-up or SaaS environment Strong understanding of modern DevOps tooling and observability stacks Experience with CI/CD, containerisation, and performance tuning Excellent communication skills and ability to collaborate across teams Security Clearance Requirement Applicants must be eligible for UK Security Clearance. More ❯
Posted:

AWS Cloud Engineer

London, United Kingdom
Hybrid / WFH Options
ARM
manage and support a customer's AWS and Data platform To be technical hands on Provide Incident and problem management on the AWS IaaS and PaaS Platform Monitoring and observability of system and platform performance Collaboration with development and build teams on application and platform deployments and changes Involvement in the resolution of Incidents and problems in an efficient and … timely manner Actively monitor an AWS platform and components for technical issues Implement and improve on existing monitoring and observability solution To be involved in the resolution of technical incidents tickets Assist in the root cause analysis of incidents Assist with improving efficiency and processes within the team Examining traces and logs Escalate incidents and problems to the appropriate teams More ❯
Employment Type: Contract
Rate: £450 - £480/day
Posted:

Machine Learning Engineer

london, south east england, united kingdom
Hybrid / WFH Options
Rightmove
scientists to take models from development to production-grade systems, ensuring scalability, reproducibility, and robustness. Automating feature engineering and data pipeline processes, ensuring reproducibility and auditability. Implementing monitoring and observability to detect drift, bias, and performance degradation, and setting up rollback/recovery processes. Using MLOps tools (e.g., Vertex Pipelines, Kubeflow, Weights & Biases) for experiment tracking, model registry, and automated … distributed systems). 3+ years of experience as an ML Engineer, MLOps Engineer, Data Engineer, or similar, in a larger-scale, production-focused environment. Hands-on with model monitoring, observability, and retraining pipelines. Exposure to feature stores, registries, and experimentation frameworks. Familiarity with business-driven metrics and experience balancing ML performance with commercial goals. Experience with generative AI and LLM More ❯
Posted:

Machine Learning Engineer

hertfordshire, east anglia, united kingdom
Hybrid / WFH Options
Rightmove
scientists to take models from development to production-grade systems, ensuring scalability, reproducibility, and robustness. Automating feature engineering and data pipeline processes, ensuring reproducibility and auditability. Implementing monitoring and observability to detect drift, bias, and performance degradation, and setting up rollback/recovery processes. Using MLOps tools (e.g., Vertex Pipelines, Kubeflow, Weights & Biases) for experiment tracking, model registry, and automated … distributed systems). 3+ years of experience as an ML Engineer, MLOps Engineer, Data Engineer, or similar, in a larger-scale, production-focused environment. Hands-on with model monitoring, observability, and retraining pipelines. Exposure to feature stores, registries, and experimentation frameworks. Familiarity with business-driven metrics and experience balancing ML performance with commercial goals. Experience with generative AI and LLM More ❯
Posted:

Machine Learning Engineer

buckinghamshire, south east england, united kingdom
Hybrid / WFH Options
Rightmove
scientists to take models from development to production-grade systems, ensuring scalability, reproducibility, and robustness. Automating feature engineering and data pipeline processes, ensuring reproducibility and auditability. Implementing monitoring and observability to detect drift, bias, and performance degradation, and setting up rollback/recovery processes. Using MLOps tools (e.g., Vertex Pipelines, Kubeflow, Weights & Biases) for experiment tracking, model registry, and automated … distributed systems). 3+ years of experience as an ML Engineer, MLOps Engineer, Data Engineer, or similar, in a larger-scale, production-focused environment. Hands-on with model monitoring, observability, and retraining pipelines. Exposure to feature stores, registries, and experimentation frameworks. Familiarity with business-driven metrics and experience balancing ML performance with commercial goals. Experience with generative AI and LLM More ❯
Posted:

AppSec Lead

Central London, London, United Kingdom
Hybrid / WFH Options
Halian Technology Limited
A leading fintech company is seeking a Lead AppSec Engineer to join their established team. Youll be instrumental in embedding security into every stage of the software development lifecycleguiding engineers, shaping best practices, and driving secure, scalable solutions across our More ❯
Employment Type: Permanent, Work From Home
Posted:

Senior Software Engineer

United Kingdom
Hybrid / WFH Options
Bezos
At Bezos, our vision is to Deliver Happiness : for our team, for the end consumers, for the e-commerce sellers, and for our logistics partners. Exciting times in e-commerce : E-commerce sales are driven by consumers who increasingly buy More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Azure AI Engineer

United Kingdom
Hybrid / WFH Options
Cognitive Group | Part of the Focus Cloud Group
Functions, Logic Apps, and APIs to orchestrate data and AI workflows. Design and deliver retrieval-augmented generation (RAG) and Copilot-style assistants embedded into business and web applications. Embed observability and monitoring into AI and data pipelines, tracking performance, quality, and cost. Collaborate with data scientists, architects, and product teams to turn prototypes into enterprise-ready AI services . Stay … or equivalent for building and extending web or Power Apps solutions. Knowledge of Azure DevOps, CI/CD, and Infrastructure-as-Code (Bicep, Terraform). Deep appreciation of governance, observability, and secure design principles . More ❯
Posted:

Founding Senior Software Engineer

Amsterdam, Noord-Holland, Netherlands
Hybrid / WFH Options
Bonnie
ll lead Technical direction & architecture - for our core platform across real time telephony, messaging, and AI workflows. Code quality at scale - set and enforce standards (testing, CI/CD, observability) that keep us fast and reliable. 0 1 initiatives - spot opportunities, prototype quickly, and ship features that move the business. Engineering culture - mentor, review, and act as a multiplier; help … workflows. Design - for performance, reliability, and maintainability; make key architectural decisions and define the technical roadmap. Deliver quality - write testable, production grade code; champion CI/CD, monitoring, and observability best practices. Solve real problems - from improving call routing to 10 traffic, to intelligent WhatsApp automation for complex bookings. Integrate systems - connect Bonnie to CRMs, reservation platforms, telephony APIs, and More ❯
Employment Type: Permanent
Salary: EUR 80,000 Monthly
Posted:

Senior DevSecOps engineer

England, United Kingdom
Hybrid / WFH Options
Seccl Technology Limited
handling, JWK publishing, and SSO connection setup. Utilising Infrastructure as Code (Terraform) and CI/CD (GitHub Actions) to manage Auth0 configuration and ensure safe, repeatable deployments. Implementing comprehensive observability for authentication paths with structured logs, monitoring dashboards, alerts, and SLOs. Collaborating closely with product, engineering, and support teams on migration timelines, communications, and incident response. This role's for … and identity configurations, including secure secrets management. Solid understanding of core AWS services relevant to modern authentication patterns, such as API Gateway, Lambda authorisers, and CloudWatch. A commitment to observability, with hands-on experience implementing structured logging, dashboards, and SLOs for critical services. Excellent collaboration skills, demonstrated through participation in design reviews, pairing, and writing clear technical documentation (e.g., runbooks More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Senior DevSecOps engineer

Edinburgh, Midlothian, United Kingdom
Hybrid / WFH Options
Seccl Technology Limited
handling, JWK publishing, and SSO connection setup. Utilising Infrastructure as Code (Terraform) and CI/CD (GitHub Actions) to manage Auth0 configuration and ensure safe, repeatable deployments. Implementing comprehensive observability for authentication paths with structured logs, monitoring dashboards, alerts, and SLOs. Collaborating closely with product, engineering, and support teams on migration timelines, communications, and incident response. This role's for … and identity configurations, including secure secrets management. Solid understanding of core AWS services relevant to modern authentication patterns, such as API Gateway, Lambda authorisers, and CloudWatch. A commitment to observability, with hands-on experience implementing structured logging, dashboards, and SLOs for critical services. Excellent collaboration skills, demonstrated through participation in design reviews, pairing, and writing clear technical documentation (e.g., runbooks More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Senior DevSecOps engineer

Bath, Somerset, United Kingdom
Hybrid / WFH Options
Seccl Technology Limited
handling, JWK publishing, and SSO connection setup. Utilising Infrastructure as Code (Terraform) and CI/CD (GitHub Actions) to manage Auth0 configuration and ensure safe, repeatable deployments. Implementing comprehensive observability for authentication paths with structured logs, monitoring dashboards, alerts, and SLOs. Collaborating closely with product, engineering, and support teams on migration timelines, communications, and incident response. This role's for … and identity configurations, including secure secrets management. Solid understanding of core AWS services relevant to modern authentication patterns, such as API Gateway, Lambda authorisers, and CloudWatch. A commitment to observability, with hands-on experience implementing structured logging, dashboards, and SLOs for critical services. Excellent collaboration skills, demonstrated through participation in design reviews, pairing, and writing clear technical documentation (e.g., runbooks More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

DevOps Engineer

United Kingdom, UK
Hybrid / WFH Options
Spectrum IT Recruitment
DevOps Engineer - AWS/Azure Government Transformation Projects (AWS/Azure/DevOps) Location: Winchester, Hampshire, Hybrid Our client is a cloud-first digital consultancy, founded over 10 years ago and trusted by government, policing, and public sector organisations to More ❯
Employment Type: Part-time
Posted:

DevOps Engineer

Winchester, Hampshire, South East, United Kingdom
Hybrid / WFH Options
Spectrum It Recruitment Limited
DevOps Engineer - AWS/Azure Government Transformation Projects (AWS/Azure/DevOps) Location: Winchester, Hampshire, Hybrid Our client is a cloud-first digital consultancy, founded over 10 years ago and trusted by government, policing, and public sector organisations to More ❯
Employment Type: Permanent
Salary: £75,000
Posted:

Azure Consultant - Presales

United Kingdom
Hybrid / WFH Options
Hancock & Parsons Ltd
A well established software company are seeking an Azure Consultant to join their team! This is a hybrid technical and pre-sales role that involves a mix of customer engagement and being hands-on with Azure. If you're someone More ❯
Posted:

DevOps Engineer

southampton, south east england, united kingdom
Hybrid / WFH Options
Spectrum IT Recruitment
DevOps Engineer - AWS/Azure Government Transformation Projects (AWS/Azure/DevOps) Location: Winchester, Hampshire, Hybrid Our client is a cloud-first digital consultancy, founded over 10 years ago and trusted by government, policing, and public sector organisations to More ❯
Posted:

DevOps Engineer

Colden Common, Hampshire, United Kingdom
Hybrid / WFH Options
Spectrum IT Recruitment
DevOps Engineer - AWS/Azure Government Transformation Projects (AWS/Azure/DevOps) Location: Winchester, Hampshire, Hybrid Our client is a cloud-first digital consultancy, founded over 10 years ago and trusted by government, policing, and public sector organisations to More ❯
Employment Type: Permanent
Salary: GBP 60,000 - 75,000 Annual
Posted:

Staff Data Engineer

slough, south east england, united kingdom
Hybrid / WFH Options
Fruition Group
Job Title: Staff Data Engineer Location: London, Hybrid Salary: c.£140,000 + bonus + share options Why Apply? This is a unique opportunity to take a leading role in shaping the data strategy of a fast growing Insurtech scale More ❯
Posted:

Staff Data Engineer

london, south east england, united kingdom
Hybrid / WFH Options
Fruition Group
Job Title: Staff Data Engineer Location: London, Hybrid Salary: c.£140,000 + bonus + share options Why Apply? This is a unique opportunity to take a leading role in shaping the data strategy of a fast growing Insurtech scale More ❯
Posted:

Staff Data Engineer

London, United Kingdom
Hybrid / WFH Options
Fruition Group
Job Title: Staff Data Engineer Location: London, Hybrid Salary: c.£140,000 + bonus + share options Why Apply? This is a unique opportunity to take a leading role in shaping the data strategy of a fast growing Insurtech scale More ❯
Employment Type: Permanent
Posted:

Staff Site Reliability Engineer - Observability

City of London, London, United Kingdom
Hybrid / WFH Options
Motive Group
Senior/Staff Site Reliability Engineer - Observability | London (Hybrid) If you care deeply about building and operating world-class infrastructure for AI at scale , this one’s worth your time. We’re working with a company that builds the backbone powering some of the most demanding AI workloads on the planet. Think large-scale GPU clusters, global telemetry systems, and … distributed training environments used by leading research and enterprise teams. They’re looking for a Senior or Staff SRE with deep experience in observability at massive scale - someone who’s tuned Prometheus/Mimir, Loki, or Tempo clusters beyond 100M+ series or 10TB/day logs, and who thrives in highly technical, fast-moving environments. You’ll be working on … Designing and scaling observability for globally distributed GPU infrastructure Building automation that cuts operational toil and improves reliability Partnering with platform and infrastructure teams to deliver true visibility across complex AI systems If you’ve built or operated telemetry stacks for large-scale, GPU-heavy, or multi-tenant environments - and want to work on cutting-edge problems in a business More ❯
Posted:

Staff Site Reliability Engineer - Observability

London Area, United Kingdom
Hybrid / WFH Options
Motive Group
Senior/Staff Site Reliability Engineer - Observability | London (Hybrid) If you care deeply about building and operating world-class infrastructure for AI at scale , this one’s worth your time. We’re working with a company that builds the backbone powering some of the most demanding AI workloads on the planet. Think large-scale GPU clusters, global telemetry systems, and … distributed training environments used by leading research and enterprise teams. They’re looking for a Senior or Staff SRE with deep experience in observability at massive scale - someone who’s tuned Prometheus/Mimir, Loki, or Tempo clusters beyond 100M+ series or 10TB/day logs, and who thrives in highly technical, fast-moving environments. You’ll be working on … Designing and scaling observability for globally distributed GPU infrastructure Building automation that cuts operational toil and improves reliability Partnering with platform and infrastructure teams to deliver true visibility across complex AI systems If you’ve built or operated telemetry stacks for large-scale, GPU-heavy, or multi-tenant environments - and want to work on cutting-edge problems in a business More ❯
Posted:

Staff Site Reliability Engineer - Observability

london, south east england, united kingdom
Hybrid / WFH Options
Motive Group
Senior/Staff Site Reliability Engineer - Observability | London (Hybrid) If you care deeply about building and operating world-class infrastructure for AI at scale , this one’s worth your time. We’re working with a company that builds the backbone powering some of the most demanding AI workloads on the planet. Think large-scale GPU clusters, global telemetry systems, and … distributed training environments used by leading research and enterprise teams. They’re looking for a Senior or Staff SRE with deep experience in observability at massive scale - someone who’s tuned Prometheus/Mimir, Loki, or Tempo clusters beyond 100M+ series or 10TB/day logs, and who thrives in highly technical, fast-moving environments. You’ll be working on … Designing and scaling observability for globally distributed GPU infrastructure Building automation that cuts operational toil and improves reliability Partnering with platform and infrastructure teams to deliver true visibility across complex AI systems If you’ve built or operated telemetry stacks for large-scale, GPU-heavy, or multi-tenant environments - and want to work on cutting-edge problems in a business More ❯
Posted:

Staff Site Reliability Engineer - Observability

london (city of london), south east england, united kingdom
Hybrid / WFH Options
Motive Group
Senior/Staff Site Reliability Engineer - Observability | London (Hybrid) If you care deeply about building and operating world-class infrastructure for AI at scale , this one’s worth your time. We’re working with a company that builds the backbone powering some of the most demanding AI workloads on the planet. Think large-scale GPU clusters, global telemetry systems, and … distributed training environments used by leading research and enterprise teams. They’re looking for a Senior or Staff SRE with deep experience in observability at massive scale - someone who’s tuned Prometheus/Mimir, Loki, or Tempo clusters beyond 100M+ series or 10TB/day logs, and who thrives in highly technical, fast-moving environments. You’ll be working on … Designing and scaling observability for globally distributed GPU infrastructure Building automation that cuts operational toil and improves reliability Partnering with platform and infrastructure teams to deliver true visibility across complex AI systems If you’ve built or operated telemetry stacks for large-scale, GPU-heavy, or multi-tenant environments - and want to work on cutting-edge problems in a business More ❯
Posted:

Staff Site Reliability Engineer - Observability

slough, south east england, united kingdom
Hybrid / WFH Options
Motive Group
Senior/Staff Site Reliability Engineer - Observability | London (Hybrid) If you care deeply about building and operating world-class infrastructure for AI at scale , this one’s worth your time. We’re working with a company that builds the backbone powering some of the most demanding AI workloads on the planet. Think large-scale GPU clusters, global telemetry systems, and … distributed training environments used by leading research and enterprise teams. They’re looking for a Senior or Staff SRE with deep experience in observability at massive scale - someone who’s tuned Prometheus/Mimir, Loki, or Tempo clusters beyond 100M+ series or 10TB/day logs, and who thrives in highly technical, fast-moving environments. You’ll be working on … Designing and scaling observability for globally distributed GPU infrastructure Building automation that cuts operational toil and improves reliability Partnering with platform and infrastructure teams to deliver true visibility across complex AI systems If you’ve built or operated telemetry stacks for large-scale, GPU-heavy, or multi-tenant environments - and want to work on cutting-edge problems in a business More ❯
Posted:
Observability
10th Percentile
£56,250
25th Percentile
£67,500
Median
£80,000
75th Percentile
£103,750
90th Percentile
£131,500