501 to 525 of 1,260 Observability Jobs

Machine Learning Systems & Infrastructure Engineer

Hiring Organisation
Jobleads-UK
Location
Greater London, England, United Kingdom
Ship workloads with Docker and Kubernetes; maintain IaC (Terraform) for the surfaces you own and CI/CD pipelines, including self‐hosted GPU runners. Observability and reliability: Monitoring, logging, and alerting for job performance, data‐pipeline health, and cost (e.g., Prometheus/Grafana, OpenTelemetry); define SLOs and incident response … stores; and object storage with caching layers. Familiarity with ML workflow orchestration and experiment tracking (e.g., Kubeflow Pipelines, MLflow). Experience with monitoring and observability tooling (e.g., Prometheus/Grafana, OpenTelemetry) and CI/CD for infra and ML workflows (e.g., GitHub Actions). At SpAItial, we are committed ...

MLOps Architect - AWS

Hiring Organisation
Quantiphi
Location
United Kingdom
based systems. Serve as a technical authority across multiple internal and customer projects, contributing architectural patterns, best practices, and reusable frameworks. Enable observability, monitoring, drift detection, lineage tracking, and auditability across ML/LLM systems. Define and implement standards for model deployment, monitoring, governance, and automation to ensure production-grade … code (Terraform, Helm, CDK). Hands-on understanding of model drift detection, A/B testing, canary rollouts, and blue-green deployments. Familiarity with Observability stacks (Prometheus, Grafana, CloudWatch, OpenTelemetry). SQL and data transformation experience using Snowflake, Databricks, Spark. Ability to translate business goals into scalable AI/ ...

Network Analytics & Automation Leader with AI Platforms

Hiring Organisation
Jobleads-UK
Location
Chester, England, United Kingdom
Overview Automation Technologies and AI/ML-Driven Platforms and Analytics Tools; in the realm of automation technologies and AI/ML-driven observability platforms and analytics tools, the following are essential: Terraform Itential NetDevOps Splunk Python React JS Django Database Technologies Proficiency with database technologies is crucial, including: MySQL ...

Remote Principal Cloud Platform Engineer

Hiring Organisation
Jobleads-UK
Location
Cambridge, England, United Kingdom
cloud infrastructure, ensuring reliability and security of services for over 3 million users. Candidates should have substantial experience with Kubernetes, Infrastructure as Code, and observability tools. The position offers a remote work option and collaboration within a dynamic team committed to innovation. #J-18808-Ljbffr ...

Senior SRE & AI/ML Platform Engineer

Hiring Organisation
Jobleads-UK
Location
Greater London, England, United Kingdom
involves building scalable and resilient data solutions, coordinating incident management, and mentoring team members. The ideal candidate will have strong skills in site reliability, observability, and automation tools, and will play a key role in shaping a collaborative and innovative team culture. Competitive benefits, including comprehensive healthcare and retirement plans ...

Senior Site Reliability Engineer - Global Tech Ops

Hiring Organisation
Jobleads-UK
Location
Greater London, England, United Kingdom
major cloud platforms like AWS. The position emphasizes leadership and collaborative troubleshooting within the global technical operations team. Ideal applicants will bring expertise in observability tools and Infrastructure as Code practices. #J-18808-Ljbffr ...

Senior AI/ML Data Platform SRE Lead

Hiring Organisation
Jobleads-UK
Location
Greater London, England, United Kingdom
Greater London. In this role, you will develop scalable and resilient data solutions, manage incident resolutions, and foster team collaboration. Your experience with observability tools and site reliability principles will be key, along with proficiency in Python or PySpark. The position offers the chance to drive strategic change ...

Platform Engineer (Cloud)

Hiring Organisation
Paragon Alpha - Hedge Fund Talent Business
Location
London Area, United Kingdom
this role, you would be responsible for designing, developing and managing platform APIs to automate cloud workflows, as well as contributing to platform observability including monitoring, logging and tracing. The role involves collaboration with teams across the firm, primarily including Cloud Engineering and Security. Stack: Python/Go, AWS, Kubernetes ...

Site Reliability Engineer

Hiring Organisation
Arrows
Location
City of London, London, United Kingdom
media platform supporting high traffic, customer facing systems used by millions daily 🌍 You’ll be working across: ☁️ Kubernetes ⚙️ Terraform & Automation 🚀 CI/CD pipelines 📊 Observability & platform reliability 🌐 Fastly/Akamai CDN platforms Strong CDN experience is absolutely essential for this role. Hands on Fastly or Akamai exposure is a core ...

Software Engineer

Hiring Organisation
Acceler8 Talent
Location
City of London, London, United Kingdom
Scale and optimise multi-cloud GPU clusters Build tooling for scheduling, remediation, and node health Debug GPU/NCCL performance at cluster scale Improve observability, storage, and infrastructure reliability 🔧 What They’re Looking For Strong systems engineering background Deep Kubernetes + GPU infrastructure experience Strong coding ability Experience with NCCL ...

Senior Quant C++ Engineer

Hiring Organisation
Harrington Starr
Location
United Kingdom
Translating research models into production systems Profiling and performance optimisation across critical paths Multithreading, concurrency and lock-free programming Monitoring live systems and improving observability Working closely with core trading and infrastructure teams on performance improvements They are looking for someone with: Strong modern C++ experience (C++17+) Background in performance ...

Microsoft SQL Database Site Reliability Engineer

Hiring Organisation
Jobleads-UK
Location
Knutsford, England, United Kingdom
level guidance on incidents, root cause, and long‐term fixes. Drive automation and standardisation - Reduce TOIL through scripting, configuration management, and platform engineering. Enhance observability - Improve monitoring, alerting, telemetry, and reliability insights across the estate. Collaborate across engineering - Work with product, engineering, and platform teams to deliver resilient, scalable database ...

Principal Platform Engineer

Hiring Organisation
Jobleads-UK
Location
United Kingdom
reap up to three paid Kubernetes certifications in their first year. Qualifications Cloud – AWS at scale Microservices – designing, migrating, and operating large scale systems Observability & Reliability – monitoring, alerting, resilience Mentorship & Leadership – guiding teams and sharing best practices In return, you’ll work on cutting edge tech, lead talented teams ...

Product Engineer (NetSuite)

Hiring Organisation
Radley James
Location
City of London, London, United Kingdom
systems. This is not a traditional integrations role — you'll own the full lifecycle of key ERP connectors, from architecture and implementation through to observability, customer onboarding, and long-term scalability. What you'll be working on: • Deep NetSuite integrations (SuiteQL, SuiteTalk, SuiteScript) • ERP and accounting platform connectivity • API architecture ...

Cloud Network Engineer

Hiring Organisation
Summer Browning Associates
Location
City of London, London, United Kingdom
Employment Type
Contract
Contract Rate
£525 - £550/day competitive
securing networking for Kubernetes (GKE), including private control planes, network policies, and specialised alias IP ranges. In particular, Istio Service Mesh. Network Observability: Utilising advanced diagnostic tools and flow logs to monitor network health, visualise throughput, and perform deep-packet troubleshooting. What You'll Bring Proven experience in designing ...

Scala Developer (Remote)

Hiring Organisation
Stealth iT Consulting
Location
United Kingdom
Agile environment (Scrum/Kanban). Participate in code reviews, architecture discussions and pair programming. Troubleshoot and resolve production issues; contribute to reliability and observability (logging, metrics, alerts). Help define CI/CD pipelines and deployment processes (e.g., Jenkins/GitHub Actions/Concourse). Produce concise technical documentation ...

Senior Data Engineer

Hiring Organisation
Opus Recruitment Solutions Ltd
Location
London, South East, England, United Kingdom
Employment Type
Full-Time
Salary
£60,000 - £70,000 per annum
models across bronze/silver/gold layers. Lakehouse Engineering : Own schema design, optimisation, partitioning and clustering in BigQuery. Pipeline Reliability : Ensure data quality, observability and freshness across all pipelines. Architecture Contribution : Help shape the evolving data platform as the product and dataset scale. In return ...

Founding Engineer | £100m Funding - No VC Money

Hiring Organisation
Tech Talent Network
Location
City of London, London, United Kingdom
decision-making and agent context ⚙️ Build backend systems for knowledge extraction, enrichment, and automation ⚙️ Contribute to DevOps practices across CI/CD, infrastructure, and observability What we’re looking for: 👩 💻 Deep full-stack JavaScript or strong backend Python expertise 🔥 High agency mindset with end-to-end ownership 🎯 Strong track record ...

Scala Developer (Remote)

Hiring Organisation
Stealth iT Consulting
Location
City of London, London, United Kingdom
Agile environment (Scrum/Kanban). Participate in code reviews, architecture discussions and pair programming. Troubleshoot and resolve production issues; contribute to reliability and observability (logging, metrics, alerts). Help define CI/CD pipelines and deployment processes (e.g., Jenkins/GitHub Actions/Concourse). Produce concise technical documentation ...

Scala Developer (Remote)

Hiring Organisation
Stealth iT Consulting
Location
East London, London, United Kingdom
Agile environment (Scrum/Kanban). Participate in code reviews, architecture discussions and pair programming. Troubleshoot and resolve production issues; contribute to reliability and observability (logging, metrics, alerts). Help define CI/CD pipelines and deployment processes (e.g., Jenkins/GitHub Actions/Concourse). Produce concise technical documentation ...

Scala Developer (Remote)

Hiring Organisation
Stealth iT Consulting
Location
Bury, Greater Manchester, United Kingdom
Agile environment (Scrum/Kanban). Participate in code reviews, architecture discussions and pair programming. Troubleshoot and resolve production issues; contribute to reliability and observability (logging, metrics, alerts). Help define CI/CD pipelines and deployment processes (e.g., Jenkins/GitHub Actions/Concourse). Produce concise technical documentation ...

Scala Developer (Remote)

Hiring Organisation
Stealth iT Consulting
Location
Leeds, West Yorkshire, United Kingdom
Agile environment (Scrum/Kanban). Participate in code reviews, architecture discussions and pair programming. Troubleshoot and resolve production issues; contribute to reliability and observability (logging, metrics, alerts). Help define CI/CD pipelines and deployment processes (e.g., Jenkins/GitHub Actions/Concourse). Produce concise technical documentation ...

Rust Engineer

Hiring Organisation
REALM
Location
England, United Kingdom
working with multi-user distributed systems, and experience with CRDTs or similar consistency approaches is a real plus. Stack: Rust, AWS, Kubernetes, Terraform, with observability tooling and AI model integration. Day-to-day, you'd be working in short iterations of under four weeks, collaborating with Product to shape technical ...

Engineering Director

Hiring Organisation
SF Partners
Location
Nottingham, Stanton Gate, Derbyshire, United Kingdom
Employment Type
Permanent
Salary
£120000 - £135000/annum bonus & share options & clear prog
with business impact to drive rapid growth. This Engineering Director will use their existing JavaScript and Azure stack knowledge to set delivery practices across observability, DevOps, agile and security whilst using key skills in building LLM agents to drive a growth mindset around AI and spec driven development. This role ...

Scala Developer - Remote Contract

Hiring Organisation
Stealth iT Consulting
Location
East London, London, United Kingdom
environment (Scrum/Kanban). Participate in code reviews, architecture discussions, and pair programming sessions. Troubleshoot and resolve production issues; contribute to reliability and observability (logging, metrics, alerts). Assist in defining CI/CD pipelines and deployment processes (e.g., Jenkins, GitHub Actions, Concourse). Produce concise technical documentation ...