Observability Jobs in London

76 to 100 of 171 Observability Jobs in London

ML Engineer

london, south east england, united kingdom
Hybrid/Remote Options
Vortexa
AWS services (SageMaker, S3, EC2, Lambda, etc.) Have experience with infrastructure as code tools (Terraform, CloudFormation) Have experience with Apache Kafka and real-time streaming frameworks Are familiar with observability principles such as logging, monitoring, and distributed tracing for ML systems Have experience with transformer architectures and generative AI applications in operational contexts Have experience with time series analysis and More ❯
Posted:

VP of Engineering

london, south east england, united kingdom
JAAQ
with Product, Design, and Clinical to ensure our technology is not just beautiful and fast, but clinically safe and ethically sound. Establish strong architectural and operational patterns: testing automation, observability, continuous delivery, and platform-level reusability. Inspire through action: you'll code, mentor, and model how high-agency engineering leaders think and build. Sit at the core of a mission More ❯
Posted:

AI Solution Lead

london, south east england, united kingdom
AND Digital
REST APIs, JSON, OAuth, and integration patterns with enterprise systems (e.g., CRM, ERP, data products and microservices) Experience applying Generative AI and prompting techniques. Strong understanding of AI governance, observability, and compliance frameworks. Proven ability to deliver secure, scalable, and responsible AI solutions. Excellent communication and presentation skills Extensive experience working collaboratively with diverse colleagues and stakeholders. Knowledge of the More ❯
Posted:

AI Engineer

london, south east england, united kingdom
Codurance
skills with demonstrated ability to optimise for various use cases A deep understanding of RAG architectures, including vector databases, embedding strategies, and retrieval optimisation Strong experience with evaluation and observability tools for AI systems Familiarity with Agentic frameworks such as OpenAI Agent SDK, LangChain, CrewAI Hands-on experience with AI-assisted software development tools such as Claude Code, GitHub Copilot More ❯
Posted:

Team Lead - Site Reliability Engineering

London, United Kingdom
Arbuthnot Latham
skills and expertise to automating manual tasks (TOIL) in such areas as incident management, problem management, change management, and release management tasks, and provides operational insights through monitoring and observability; and other aspects involved in preparing and optimising automated delivery solutions. To place the interests of customers at the centre of all activities, act in a way that is consistent … a root cause analysis to troubleshoot priority incidents. Implement automation to reduce probability and/or impact of problems recurring possible options could include automated incident response, enhanced monitoring, observability initiatives, automation to change and release management . Identify, evaluate, and recommend monitoring and observability tools and diagnostic techniques to improve system observability and insights, including identification of requirements, nonfunctional … environments Experience of communicating complex issues to senior stakeholders and technical teams. Implementation of highly available and reliable systems, using multi-AZ and multiregional approaches Expertise with monitoring and observability tools (e.g. SolarWinds, Datadog, Azure/AWS native tools) Expertise with SLI/SLO management tools such as (ServiceNow) Expertise with Incident ticketing and change management systems such as (ServiceNow More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Messaging Engineer

East London, London, United Kingdom
Ncounter LTD
and continuous improvement of Solace environments, working across development, infrastructure, and cloud teams to deliver a stable and well-governed messaging service. You will troubleshoot problems, refine configurations, improve observability, and help drive upgrades, automation, and improved resilience. Experience Needed At least 1 year of hands-on experience configuring, administering, and troubleshooting Solace PubSub+ Strong understanding of event-driven and More ❯
Employment Type: Permanent
Posted:

Solace Expert

East London, London, England, United Kingdom
Ncounter
and continuous improvement of Solace environments, working across development, infrastructure, and cloud teams to deliver a stable and well-governed messaging service. You will troubleshoot problems, refine configurations, improve observability, and help drive upgrades, automation, and improved resilience. Experience Needed • At least 1 year of hands-on experience configuring, administering, and troubleshooting Solace PubSub+• Strong understanding of event-driven and More ❯
Employment Type: Full-Time
Salary: £110,000 - £125,000 per annum
Posted:

Senior Data Engineer

London, South East, England, United Kingdom
Harnham - Data & Analytics Recruitment
maintain scalable, automated, reliable data pipelines across a modern cloud stack Extend and improve a cutting-edge Data & Analytics Platform supporting mission-critical insurance products Implement data quality checks, observability metrics and troubleshooting processes Manage cloud resources via Infrastructure-as-Code Ensure strong data security, access control, and governance Work closely with commercial, analytics and engineering teams to deliver high More ❯
Employment Type: Full-Time
Salary: £70,000 - £80,000 per annum
Posted:

Technical Development Lead - Enfield

Enfield, London, United Kingdom
Crimson
security and compliance by implementing CIAM flows, and adhering to ISO 27001 standards. Develop resilient architectures for retail and e-commerce systems, considering networking and SD-WAN performance. Configure observability tools for monitoring, logging, and performance metrics. Mentor and guide a small technical team, enforce coding standards, and apply Agile principles. Translate business objectives into technical solutions for e-commerce More ❯
Employment Type: Permanent
Salary: £80,000
Posted:

Technical Development Lead - Enfield

Enfield, Middlesex, England, United Kingdom
Crimson
security and compliance by implementing CIAM flows, and adhering to ISO 27001 standards. Develop resilient architectures for retail and e-commerce systems, considering networking and SD-WAN performance. Configure observability tools for monitoring, logging, and performance metrics. Mentor and guide a small technical team, enforce coding standards, and apply Agile principles. Translate business objectives into technical solutions for e-commerce More ❯
Employment Type: Full-Time
Salary: £65,000 - £80,000 per annum
Posted:

Technical Development Lead - Hybrid

Enfield, Middlesex, England, United Kingdom
Hybrid/Remote Options
Crimson
security and compliance by implementing CIAM flows, and adhering to ISO 27001 standards. Develop resilient architectures for retail and e-commerce systems, considering networking and SD-WAN performance. Configure observability tools for monitoring, logging, and performance metrics. Mentor and guide a small technical team, enforce coding standards, and apply Agile principles. Translate business objectives into technical solutions for e-commerce More ❯
Employment Type: Full-Time
Salary: £65,000 - £80,000 per annum
Posted:

Enterprise Architect, Agentic AI Implementation

London, United Kingdom
DCV Technologies
agent systems, integrating them with core enterprise systems like SAP, Salesforce, and the ECOLAB3D™ platform. Define and enforce architectural standards and governance frameworks for the agent lifecycle, data lineage, observability, and interoperability. Technology Evaluation and Selection: Evaluate and select AI platforms, tools, and protocols, such as LangChain, AutoGen, or similar frameworks, ensuring they meet scalability, security, and performance requirements within More ❯
Employment Type: Permanent
Posted:

Staff Infrastructure Engineer

london, south east england, united kingdom
aionex
Central Limit Order Book (CLOB), entirely on an EVM-compatible chain You will develop and maintain bare-metal and cloud environments, service orchestration, network connectivity, databases, blockchain nodes and observability to the highest standards of reliability, performance and security. Although this project centres around a blockchain system, previous experience with blockchain is not a hard requirement but keen interest in More ❯
Posted:

Low Latency Network Engineer

London, United Kingdom
Millennium Management LLC
optimization, anomaly detection, and predictive analytics. Understanding of AI frameworks and libraries (e.g., TensorFlow, PyTorch, Scikit-learn) and their application in network automation and monitoring. Experience with telemetry and observability frameworks (e.g., Prometheus, Grafana) for real-time network monitoring and troubleshooting. Experience : Minimum of 7 years' of experience in network engineering, operations, and support. Proven ability to work hands-on More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Staff Software Engineer- (Optimizely Analytics, Backend)

London, United Kingdom
Optimizely
will: Design and evolve the architecture of highly scalable, reliable, and secure distributed systems. Drive technical excellence across the engineering organization by setting standards for code quality, system design, observability, and operational best practices. Collaborate closely with Product, UX, and Application Engineering teams to deliver impactful features while ensuring architectural soundness and scalability. Mentor and guide senior and mid-level More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Head of Integrations SaaS / Software

City of London, London, United Kingdom
RedTech Recruitment
other internal teams to fully understand client requirements and deliver tailored technical solutions. Design and implement scalable, future-proof architectures for new third-party connectors and integrations. Enhance system observability by improving diagnostics, logging, and tracing to aid technical support teams in resolving issues swiftly. Oversee the ongoing development and management of the public API, covering REST and event streaming More ❯
Employment Type: Professional qualifications
Posted:

Head of Integrations - SaaS / Software

London, South East, England, United Kingdom
REDTECH RECRUIT
other internal teams to fully understand client requirements and deliver tailored technical solutions. Design and implement scalable, future-proof architectures for new third-party connectors and integrations. Enhance system observability by improving diagnostics, logging, and tracing to aid technical support teams in resolving issues swiftly. Oversee the ongoing development and management of the public API, covering REST and event streaming More ❯
Employment Type: Full-Time
Salary: Competitive salary
Posted:

Site Reliability Engineer - London

London, United Kingdom
Hybrid/Remote Options
Valarian Technologies Limited
you thrive in a fast-paced environment where you can make a real difference, we want to hear from you! Required skills/expertise: Develop and implement a comprehensive observability strategy for self-hosted deployments, including infrastructure and tooling for monitoring, alerting, and troubleshooting. This will involve designing and implementing robust metrics and logging systems. Engineer the ACRA platform for More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

AI Engineer (Platform)

London, United Kingdom
Hybrid/Remote Options
nPlan limited
applications of AI for the construction domain, pushing the boundaries of what's possible. Build core infrastructure that allows us to build and ship LLM apps quickly - this includes observability, how we work with several LLM providers + our own fine tuned models. Work with other engineers in the product and research teams to bring new models and applications to More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Lead Frontend Engineer

london, south east england, united kingdom
JustPark
international markets Previous experience in the parking or mobility sector Experience with GraphQL and modern API integration patterns Knowledge of micro-frontend architectures Experience with advanced performance monitoring and observability tools Growth Opportunities Opportunity to shape the frontend strategy for a rapidly growing international company Increasing involvement in strategic technical decision-making Development of broader technology leadership skills Experience in More ❯
Posted:

Platform Engineer

City of London, London, England, United Kingdom
Revybe IT Recruitment Ltd
DevOps, infrastructure, and platform engineering. Tech Stack Cloud: AWS (EC2, RDS, S3, IAM, CloudWatch, Lambda) Infrastructure as Code: Terraform Containerisation & Orchestration: Docker, Kubernetes (EKS), Helm Configuration Management: Ansible Monitoring & Observability: Grafana, Prometheus CI/CD: GitHub Actions Automation & Scripting: Python, Bash, Go or Java What We’re Looking For Proven experience running AWS cloud infrastructure in a production or regulated … financial) environment. Hands-on experience managing Kubernetes clusters (preferably EKS). Strong understanding of Infrastructure as Code using Terraform. Familiarity with monitoring and observability stacks such as Prometheus and Grafana. Experience building and maintaining CI/CD pipelines (GitHub Actions or similar). Strong scripting or automation skills using Python, Bash, Go or Java . A collaborative mindset — comfortable working More ❯
Employment Type: Full-Time
Salary: £65,000 - £80,000 per annum
Posted:

AWS DevOps Engineer

City of London, London, England, United Kingdom
Revybe IT Recruitment Ltd
AWS (Core Services – EC2, RDS, S3, IAM, Lambda, CloudWatch) Infrastructure as Code: Terraform Containerisation & Orchestration: Docker, Kubernetes (EKS), Helm Configuration Management: Ansible CI/CD Pipelines: GitHub Actions Monitoring & Observability: Grafana, Prometheus Scripting/Automation: Python or Java What We’re Looking For Proven experience managing and scaling AWS cloud environments , ideally supporting live software products or high-traffic platforms. … Strong background in Terraform and Infrastructure as Code best practices. Practical experience with Kubernetes (EKS) in production. Familiarity with monitoring and observability tools such as Grafana and Prometheus. Hands-on experience building CI/CD pipelines (GitHub Actions, Jenkins, CircleCI, etc.). Solid scripting and automation experience using Python or Java . A collaborative engineer who enjoys working closely with More ❯
Employment Type: Full-Time
Salary: £55,000 - £80,000 per annum
Posted:

Cloud Infrastructure Engineer

london, south east england, united kingdom
Hybrid/Remote Options
Black Pen Recruitment
tooling, systems design, and operational resilience. Their environment offers opportunities to work on everything from CI/CD pipelines and container orchestration to configuration management, infrastructure as code, and observability tooling. While you may bring experience in specific tools or platforms, you will be expected to contribute broadly across our infrastructure landscape. Our client's core product is a comprehensive … Solid Linux administration and general networking knowledge Understanding of infrastructure security best practices, including secure configuration, identity and access management, and compliance controls Experience with monitoring, alerting, and system observability Background in financial services infrastructure is advantageous but not required More ❯
Posted:

Staff Machine Learning Engineer

london, south east england, united kingdom
TWG Global AI
environments. Lead the development of production-ready pipelines, including feature stores, model registries, and scalable inference services. Champion MLOps best practices (CI/CD for ML, model versioning, monitoring, observability) to ensure models are reliable, reproducible, and cost-efficient. Partner with Data Scientists to operationalize experimental models, enabling scalability and generalizability across diverse business domains. Integrate emerging ML engineering techniques … GCP Vertex AI, Azure ML) and containerized deployments (Kubernetes, Docker). Hands-on experience with data and model pipelines (feature stores, registries, distributed training, inference scaling). Knowledge of observability and monitoring stacks (Prometheus, Grafana, ELK, Datadog) for ML system performance. Experience collaborating with cross-functional teams in regulated industries (finance, insurance, health) with compliance and governance needs. Exceptional communication More ❯
Posted:

Platform Engineer

London, United Kingdom
Movement8
pipelines, reducing deployment time and improving release reliability Strengthen system resilience through infrastructure improvements and scalability planning Work with Product Engineer's to enhance developer experience Drive automation and observability Requirements: Strong GCP experience Deep understanding of Terraform CI/CD pipelines Containerisation (Kubernetes, GKE) If you're interested get in touch ASAP More ❯
Employment Type: Permanent
Posted:
Observability
London
10th Percentile
£62,500
25th Percentile
£73,750
Median
£90,000
75th Percentile
£120,000
90th Percentile
£157,500