76 to 100 of 239 Observability Jobs in London

Senior Platform Engineer Linux AWS HPC

Hiring Organisation
Client Server
Location
London, UK
premise and cloud (AWS) based services. You'll collaborate with internal teams to optimise HPC platforms, with a focus on improving performance, resilience and observability of the compute infrastructure. You'll also contribute to continuous improvements initiatives, automating wherever possible and sharing your expertise with the team. Location/ ...

Senior Generative AI Engineer

Hiring Organisation
Method-Resourcing
Location
London, United Kingdom
Employment Type
Permanent, Work From Home
databases, and prompt engineering frameworks. Proven record of delivering measurable ROI from AI projects. Solid engineering fundamentals: version control, CI/CD for ML, observability, API integration, and cloud infrastructure (AWS/Azure/GCP). Confident communicator able to explain complex trade-offs to non-technical stakeholders. UK-based ...

AI DevOps Engineer

Hiring Organisation
SThree
Location
London, UK
agent orchestration using AKS. You will create and maintain CI/CD pipelines for Azure services, Semantic Kernel agents, manage Kubernetes clusters, and integrate observability tools to monitor system health and performance. You’ll also ensure alignment with enterprise-grade security practices, including zero trust principles, identity-aware routing ...

Senior ML Engineer

Hiring Organisation
Method-Resourcing
Location
London, United Kingdom
Employment Type
Permanent, Work From Home
ship ML-powered features into production. Continuously assess and iterate on production models, balancing long-term ML strategy with tactical improvements. Champion code quality, observability, and resilience within their ML systems through reviews and hands-on contributions. Help shape their internal ML standards and practices, ensuring they stay ahead ...

Python Software Engineer

Hiring Organisation
Fundment
Location
London, UK
Shape technical direction, contribute to architectural decisions, and plan for scalability, performance, and reliability Write clean, pragmatic code with a strong focus on maintainability, observability, and security Investigate complex issues across distributed systems, identify root causes, and drive long-term solutions Collaborate closely with Product, Operations, and fellow engineers ...

Principal Software Engineer

Hiring Organisation
BPP
Location
London, UK
Employment Type
Full-time
engineering team. Contribute and evolve the internal software engineering practices and standards as the team scales. Driving continuous technical improvement through the analysis of observability metrics and user feedback. Be up-to-date with industry best practices, new technologies, and emerging trends. What we're looking for To be successful ...

Senior Director of Engineering (f/m/d) - unified web and mobile applications

Hiring Organisation
Epassi
Location
London, UK
Employment Type
Full-time
support for existing, fragmented client solutions. Modern Practices: Define and implement best-in-class engineering practices, including CI/CD, testing, code quality, and observability, specifically tailored to maximize velocity and reliability. Gen AI Adoption: Strategically identify and implement the use of Generative AI tools and techniques to accelerate feature ...

Software Engineering Manager - Price and Location

Hiring Organisation
Marks and Spencer
Location
London, UK
it. Tech Stack M&S uses a variety of technologies including; Java, Spring, SpringBOOT, Micronaut React, Next.js, Typescript, Angular Azure Cloud, Kubernetes, Dynatrace (observability) SQL Server, MongoDB Ignite, Redis What’s In It For You Working at M&S means being part of something bigger - helping to deliver quality, value ...

Founding Software Engineer

Hiring Organisation
Synkka LTD
Location
London, South East, England, United Kingdom
Employment Type
Full-Time
Salary
Competitive salary
work at the frontier of software engineering and AI: Backend: Python + FastAPI Frontend: React Infra: Azure (infra-as-code, CI/CD, observability) AI Stack: LLM pipelines for agentic codegen, testing, and self-healing interfaces What You'll Do As one of our first Product Engineers ...

Staff Software Engineer

Hiring Organisation
Rivian
Location
London, England, United Kingdom
high throughput. Tackle concurrency challenges to ensure efficient resource utilization when processing MCAP files. Operational Excellence: Champion a DevOps culture. Define SLOs, implement comprehensive observability (metrics, distributed tracing, logging), and utilize Infrastructure as Code (Terraform) to ensure reproducible environments. Technical Leadership: Act as a technical multiplier and mentor. You will ...

Senior Software Engineer, Wikimedia Enterprise

Hiring Organisation
Wikimedia Foundation
Location
London, UK
Employment Type
Full-time
similar distributed event processing systems Experience working with Nodejs and Go applications Comfortable with configuration management and orchestration tools (ECS, Kubernetes), and modern observability infrastructure (monitoring, metrics and logging) Aptitude for automation and streamlining of tasks Comfortable with shell and scripting languages used in an SRE/Operations engineering context ...

Senior Software Engineer, Wikimedia Enterprise

Hiring Organisation
Wikimedia Foundation
Location
South London, UK
Employment Type
Full-time
similar distributed event processing systems Experience working with Nodejs and Go applications Comfortable with configuration management and orchestration tools (ECS, Kubernetes), and modern observability infrastructure (monitoring, metrics and logging) Aptitude for automation and streamlining of tasks Comfortable with shell and scripting languages used in an SRE/Operations engineering context ...

Senior Site Reliability Engineer

Hiring Organisation
Moneycorp
Location
London, UK
governance, resilience testing, and platform patterns, ensuring our systems meet the highest levels of operational resilience and regulatory compliance Key Responsibilities: Reliability Engineering & Observability Define and maintain SLOs/SLIs and error budgets for critical services Build and improve observability pipelines (metrics, logs, traces) Maintain dashboards for golden signals Develop … Actions for automated build, test, and deployment workflows. Hands-on experience with infrastructure as code (Terraform/Bicep), CI/CD pipelines, and automation. Observability & Monitoring: Hands-on with Prometheus, Grafana, OpenTelemetry, and log aggregation tools; building dashboards and alerting policies. Knowledge of observability and reliability engineering (SLOs, error budgets ...

AWS DevOps / Platform Engineer - Start Up

Hiring Organisation
Robert Walters
Location
London, South East, England, United Kingdom
Employment Type
Full-Time
Salary
£70,000 - £110,000 per annum
improvements, reducing single points of failure and enhancing autoscaling, high availability, and managed service utilisation. Collaborate with SRE, Security, and Engineering teams to improve observability, monitoring, and alerting using tools such as Prometheus, Grafana, and CloudWatch. Work closely with Security to embed best practices for IAM, secrets management … understanding of Kubernetes operations on AWS, including scaling, deployment automation, and monitoring. Solid background in Linux systems administration, networking, and cloud security. Familiarity with observability tools such as Prometheus, Grafana, and Loki, and structured alerting practices. Experience with database migrations, high-availability configurations, backups, and disaster recovery. Strong scripting ...

Senior Machine Learning Engineer (UK)

Hiring Organisation
TWG Global
Location
London, UK
Employment Type
Full-time
monitoring for performance, drift, and reliability. Collaborate with senior engineers to build internal ML engineering tools and infrastructure that improve training, testing, and observability workflows. Partner with Data Scientists to operationalize prototype models, ensuring they are scalable, robust, and cost-efficient in production. Work with large-scale datasets to enable … engineering fundamentals (pipelines, deployment, monitoring) and familiarity with data science workflows. Experience with MLOps tools such as MLflow, Weights & Biases, or equivalent. Exposure to observability/monitoring systems (Prometheus, Grafana, ELK, Datadog) is a plus. Proficiency in Python and familiarity with ML libraries (scikit-learn, XGBoost, TensorFlow, PyTorch). Strong ...

Senior Software Engineer (TLMT)

Hiring Organisation
Visa
Location
London, UK
management engine using Java and related technologies. Implement event-driven patterns where appropriate to support real-time decision-making. Ensure high availability, scalability, and observability of critical components. Collaborate with cross-functional teams to understand requirements and deliver solutions that meet organisational needs. Contribute to continuous improvement in code quality … next phase of growth, are written to 12-factor principles and fit into our microservices architecture Cloud-related tools, services, and distributed system observability to support these applications, such as Docker, Kubernetes, ElasticSearch, log management systems, and Datadog APM, to name but a few API specifications, conforming to the OpenAPI ...

Messaging Engineer

Hiring Organisation
Ncounter LTD
Location
East London, London, United Kingdom
Employment Type
Permanent
environments, working across development, infrastructure, and cloud teams to deliver a stable and well-governed messaging service. You will troubleshoot problems, refine configurations, improve observability, and help drive upgrades, automation, and improved resilience. Experience Needed At least 1 year of hands-on experience configuring, administering, and troubleshooting Solace PubSub+ Strong ...

Solace Expert

Hiring Organisation
Ncounter
Location
East London, London, England, United Kingdom
Employment Type
Full-Time
Salary
£110,000 - £125,000 per annum
environments, working across development, infrastructure, and cloud teams to deliver a stable and well-governed messaging service. You will troubleshoot problems, refine configurations, improve observability, and help drive upgrades, automation, and improved resilience. Experience Needed • At least 1 year of hands-on experience configuring, administering, and troubleshooting Solace PubSub+• Strong ...

Senior Snowflake Data Engineer - Remote - £competitive

Hiring Organisation
Tenth Revolution Group
Location
London, United Kingdom
Employment Type
Permanent
Salary
£75000 - £85000/annum
certifications (SnowPro Core or Advanced). Experience with dbt Cloud and custom macros. Exposure to real-time streaming (Kafka, Kinesis). Familiarity with data observability tools and BI integrations (Tableau, Power BI). What We Offer Opportunity to work with modern data technologies and large-scale architectures. Professional development ...

Senior Snowflake Data Engineer - Remote - £competitive

Hiring Organisation
Tenth Revolution Group
Location
London, South East, England, United Kingdom
Employment Type
Full-Time
Salary
£75,000 - £85,000 per annum
certifications (SnowPro Core or Advanced). Experience with dbt Cloud and custom macros. Exposure to real-time streaming (Kafka, Kinesis). Familiarity with data observability tools and BI integrations (Tableau, Power BI). What We Offer Opportunity to work with modern data technologies and large-scale architectures. Professional development ...

Senior Data Engineer

Hiring Organisation
Harnham - Data & Analytics Recruitment
Location
London, South East, England, United Kingdom
Employment Type
Full-Time
Salary
£70,000 - £80,000 per annum
pipelines across a modern cloud stack Extend and improve a cutting-edge Data & Analytics Platform supporting mission-critical insurance products Implement data quality checks, observability metrics and troubleshooting processes Manage cloud resources via Infrastructure-as-Code Ensure strong data security, access control, and governance Work closely with commercial, analytics ...

Senior Network SRE

Hiring Organisation
NLB Services
Location
London, UK
Employment Type
Full-time
Palo Alto, Check Point, Mist, Aruba, A10, Netscaler, and F5. Security & Segmentation: Support network segmentation, policy enforcement, and VPN solutions (GlobalProtect, AnyConnect). Automation & Observability: Utilize tools like Grafana, Big Panda, ServiceNow, ITMP, syslog, Splunk, Salt, Ansible, and Prometheus to enhance monitoring and automation. Innovation Projects: Collaborate on wireless design ...

HPC Platform Engineer Linux - Trading

Hiring Organisation
Client Server
Location
London, UK
premise and cloud (AWS) based services. You'll collaborate with internal teams to optimise HPC platforms, with a focus on improving performance, resilience and observability of the compute infrastructure. You'll also contribute to continuous improvements initiatives, automating wherever possible and sharing your expertise with the team. Location/ ...

HPC Platform Engineer Linux - Trading

Hiring Organisation
Client Server
Location
South West London, London, United Kingdom
Employment Type
Permanent, Work From Home
premise and cloud (AWS) based services. You'll collaborate with internal teams to optimise HPC platforms, with a focus on improving performance, resilience and observability of the compute infrastructure. You'll also contribute to continuous improvements initiatives, automating wherever possible and sharing your expertise with the team. Location/ ...

Technical Development Lead - Enfield

Hiring Organisation
Crimson
Location
Enfield, Middlesex, England, United Kingdom
Employment Type
Full-Time
Salary
£65,000 - £80,000 per annum
CIAM flows, and adhering to ISO 27001 standards. Develop resilient architectures for retail and e-commerce systems, considering networking and SD-WAN performance. Configure observability tools for monitoring, logging, and performance metrics. Mentor and guide a small technical team, enforce coding standards, and apply Agile principles. Translate business objectives into ...