Observability Jobs in London

51 to 75 of 374 Observability Jobs in London

DV Cleared Site Reliability / DevOps Engineer

London, United Kingdom
JAM Recruitment
Site Reliability/DevOp Engineer London - 5 Days Onsite Up to £550 per day (Umbrella, Inside IR35) 12-Month Contract Must hold live and transferrable DV Clearance Are you passionate about reliability, automation, and supporting mission-critical systems? Join this More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

DV Cleared Site Reliability / DevOps Engineer

London, United Kingdom
JAM Recruitment Ltd
Site Reliability/DevOp Engineer London - 5 Days Onsite Up to 550 per day (Umbrella, Inside IR35) 12-Month Contract Must hold live and transferrable DV Clearance Are you passionate about reliability, automation, and supporting mission-critical systems? Join this More ❯
Employment Type: Contract
Rate: GBP 500 - 550 Daily
Posted:

Senior Site Reliability Engineer

London, United Kingdom
Hybrid / WFH Options
Randstad Technologies Recruitment
Job Title: Senior SRE - Site Reliability Engineering for Observability Location: London (Mostly Remote | 1 Day/Week in Office) Pay Rate: £50 - £62 per hour (Inside IR35) Contract Duration: Initial 12 Months Working Hours: 11:00 AM - 7:00 PM About the Role We're looking for a Senior Site Reliability Engineer (SRE) to join a high-impact Observability team … monitoring and logging platforms that ensure service reliability, performance, and visibility. If you're passionate about distributed systems, high-throughput data pipelines, and enabling engineering teams with top-tier observability tooling-this is the role for you. What You'll Be Doing Designing and operating observability platforms (logging, monitoring, alerting) at scale. Managing large, high-performance ElasticSearch clusters and Prometheus … deployments. Building scalable data pipelines using Kafka to process millions of events per second. Developing tools, APIs, and dashboards to enable self-service observability for engineering teams. Automating infrastructure using Terraform and configuration with Ansible . Participating in on-call rotations to ensure platform uptime and responsiveness. What We're Looking For 5+ years of experience in SRE/DevOps More ❯
Employment Type: Contract
Rate: £50 - £62/hour
Posted:

Site Reliability Engineer London, United Kingdom

London, United Kingdom
Hybrid / WFH Options
NinjaOne, LLC
SRE team in the Platform Engineering organization and help us scale our products to millions of end-users. We are looking for individuals with a passion for automation and observability, ensuring the quality and availability of our services. Location - We are flexible on remote working from home, if you are based in the UK or Germany. This is a fully … our 24x7 on-call rotation, SCRUM, and deployment planning Perform Root Cause Analysis (RCA) and provide recommendations for application teams Improve availability and reduce customer impact using Industry best observability tools Ensure best-practice and security-minded architecture by influencing design decisions Create and maintain technical documentation and SOP's Develop software, scripts, or tooling to improve efficiency and reduce … time of applications and infrastructure Other duties as needed About You 5+ years' experience in Site Reliability Engineer roles Expert+ level Linux administration, scripting, and troubleshooting Demonstrable knowledge of Observability tools (Prometheus/Grafana, New Relic, Splunk, DataDog) Comprehensive experience with AWS (Amazon Web Services) and its core capabilities (VPC, EC2, ECS, Route53, Fargate, ALB/NLB distributions, etc) Extensive More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Site Reliability Engineer

London, United Kingdom
Hybrid / WFH Options
NinjaOne, LLC
SRE team in the Platform Engineering organization and help us scale our products to millions of end-users. We are looking for individuals with a passion for automation and observability, ensuring the quality and availability of our services. Location - We are flexible on remote working from home, if you are based in the UK or Germany. This is a fully … our 24x7 on-call rotation, SCRUM, and deployment planning Perform Root Cause Analysis (RCA) and provide recommendations for application teams Improve availability and reduce customer impact using Industry best observability tools Ensure best-practice and security-minded architecture by influencing design decisions Create and maintain technical documentation and SOP's Develop software, scripts, or tooling to improve efficiency and reduce … time of applications and infrastructure Other duties as needed About You 5+ years' experience in Site Reliability Engineer roles Expert+ level Linux administration, scripting, and troubleshooting Demonstrable knowledge of Observability tools (Prometheus/Grafana, New Relic, Splunk, DataDog) Comprehensive experience with AWS (Amazon Web Services) and its core capabilities (VPC, EC2, ECS, Route53, Fargate, ALB/NLB distributions, etc) Extensive More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Head of Platform Engineering (London)

London, UK
Yolo Group
functions, championing a culture of proactive readiness, efficient release pipelines, robust incident response, and continuous infrastructure improvement. This role ensures maximum uptime, enables safe and frequent deployments, establishes comprehensive observability, and drives effective postmortem practices. They will work closely with Engineering, QA, and Security leadership to embed operational excellence across the software development lifecycle and support the platform’s growth … distributed team of DevOps engineers, SREs, and incident responders; Foster a culture of ownership, continuous improvement, and operational excellence; Define and execute the long-term strategy for system reliability, observability, performance, and incident management; Champion the adoption of modern tooling, technologies, and best practices to enhance resilience and agility; Own and continuously evolve incident response processes, including SLOs, SLAs, and More ❯
Employment Type: Full-time
Posted:

Head of Platform Engineering (London)

London, UK
Yolo Group
functions, championing a culture of proactive readiness, efficient release pipelines, robust incident response, and continuous infrastructure improvement. This role ensures maximum uptime, enables safe and frequent deployments, establishes comprehensive observability, and drives effective postmortem practices. They will work closely with Engineering, QA, and Security leadership to embed operational excellence across the software development lifecycle and support the platform’s growth … distributed team of DevOps engineers, SREs, and incident responders; Foster a culture of ownership, continuous improvement, and operational excellence; Define and execute the long-term strategy for system reliability, observability, performance, and incident management; Champion the adoption of modern tooling, technologies, and best practices to enhance resilience and agility; Own and continuously evolve incident response processes, including SLOs, SLAs, and More ❯
Employment Type: Full-time
Posted:

Director of Rates and Credit Reliability Engineering (London)

London, UK
Hybrid / WFH Options
Deutsche Bank
strategy across FIC Technology, aligning reliability goals with business priorities and regulatory expectations Lead the transformation of production support into a proactive, data-driven engineering discipline focused on automation, observability, and continuous improvement Stay close to the technology—reviewing architecture, contributing to tooling, and leading by example in incident response and root cause analysis Act as a trusted advisor to … proficiency in Linux/Unix systems, SQL, and programming languages such as C++, Java or Python. Strong understanding of distributed systems and low-latency architectures Hands-on experience with observability stacks (e.g., Prometheus, Grafana, Splunk, Geneos, OpenTelemetry) and infrastructure automation (e.g., Ansible, Terraform, CI/CD pipelines) Strong understanding of the trade lifecycle, market data, and fixed income products, FX More ❯
Employment Type: Full-time
Posted:

Senior Solution Architect (Product) - Relocation Abu Dhabi - Must have worked in top AI/ Big Tech... (London)

London, UK
ZipRecruiter
as a Solution Architect for SaaS or cloud-agnostic platforms. Experience with distributed systems, API design, and multi-tenant architecture. Familiar with data architecture, AI/ML integration, and observability tools. Certifications (): AWS Solutions Architect, Google Cloud Architect, Azure Solutions Architect Expert. If you are interested in the above position, please apply below! #J-18808-Ljbffr More ❯
Employment Type: Full-time
Posted:

Senior Kotlin Developer

London, United Kingdom
Hybrid / WFH Options
Halian Technology Limited
in the team Contribute to solution architecture and strategic technical direction Build, integrate, and maintain REST APIs and backend services Champion best practices in software quality, CI/CD, observability, and DevOps Collaborate with cross-functional teams including Product, QA, and DevOps Optionally take on people management responsibilities for engineers Stay updated with emerging backend and cloud technologies Key Skills More ❯
Employment Type: Permanent, Work From Home
Salary: £90,000
Posted:

Senior Full Stack Engineer Product & Engineering Belfast

London, United Kingdom
Hybrid / WFH Options
Kadence Limited
TypeScript) Database: MySQL (Aurora DB), Redis (ElastiCache), MongoDB (AWS DocumentDB) Cloud & DevOps: AWS (20+ services), Kubernetes (EKS), Docker, Infrastructure as Code(CloudFormation, Terraform), CI/CD (Jenkins,GitHub Actions), Observability(AWS, Grafana) Development tools: GitHub, Jira, Notion, ChatGPT,Gemini,LangChain, AI-native IDE's (Cursor, JetBrains), LLM-powered internal tools. WHAT WE OFFER YOU A front-row seat in a More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Senior Software Engineer (London)

London, UK
Visa Inc
on AWS are key to our next phase of growth, are written to 12-factor principles and fit into our microservices architecture Cloud-related tools, services, and distributed system observability to support these applications, such as Docker, Kubernetes, ElasticSearch, log management systems, and Datadog APM, to name but a few API specifications, conforming to the OpenAPI (Swagger) standard, provide a More ❯
Employment Type: Full-time
Posted:

Engineering Head - Public Cloud Infrastructure Services - Director

London, United Kingdom
Hybrid / WFH Options
Citigroup Inc
native infrastructure services across: Landing Zones & Projects/Accounts - AWS Control Tower, GCP Resource Manager, etc. Network - AWS Transit Gateway, GCP Shared VPC, AWS Route53, GCP Cloud DNS, etc. Observability - AWS OpenSearch, GCP Monitoring/Traces, OpenTelemetry, Grafana, Prometheus, etc. Automation Prowess: Hands-on experience with modern Infrastructure as Code (IaC) automation tools and frameworks (Terraform, Jenkins, Ansible, etc.). More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

AI/ML Lead Engineer Product & Engineering Belfast

London, United Kingdom
Hybrid / WFH Options
Kadence Limited
Database: MySQL (Aurora DB), Redis (ElastiCache), MongoDB (AWS DocumentDB). Cloud & DevOps: AWS (20+ services), Kubernetes (EKS), Docker, Infrastructure as Code (CloudFormation, Terraform), CI/CD (Jenkins, GitHub Actions), Observability (AWS, Grafana), OpenAI in Azure. Development tools: GitHub, Jira, Notion, ChatGPT, Gemini, LangChain, AI-native IDE's (Cursor, JetBrains), LLM-powered internal tools. Test automation: Cypress (E2E), Postman (API), Jest More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Sr. Machine Learning Engineer

London, United Kingdom
Menlo Ventures
. Build evaluation pipelines to benchmark LLM performance and continuously monitor production accuracy and relevance. Work across the ML stack-from data preparation and model training to serving and observability-either independently or in collaboration with other specialists. Optimize model pipelines for latency, scalability, and cost-efficiency , and support real-time and batch inference needs. Collaborate with MLOps, DevOps, and More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Sr. Machine Learning Engineer (London)

Wandsworth, Greater London, UK
Menlo Ventures
. Build evaluation pipelines to benchmark LLM performance and continuously monitor production accuracy and relevance. Work across the ML stack—from data preparation and model training to serving and observability—either independently or in collaboration with other specialists. Optimize model pipelines for latency, scalability, and cost-efficiency , and support real-time and batch inference needs. Collaborate with MLOps, DevOps, and More ❯
Employment Type: Full-time
Posted:

Head of Infrastructure Engineering (London)

Wandsworth, Greater London, UK
Spendesk
AWS as our cloud compute platform Kubernetes (EKS) for container runtime and orchestration RDS (PostgreSQL, MySQL), Kafka, Redis Terraform for infrastructure as code Lambda and Step Functions Datadog for Observability Github actions for CICD Frontend is React Backend services are developed in NodeJS (TypeScript) As we are an international team, please submit your application and CV in English. About Spendesk More ❯
Employment Type: Full-time
Posted:

Lead Site Reliability Engineer

London, United Kingdom
Hybrid / WFH Options
Lloyds Bank plc
in critical detail to your mentees Production Kubernetes experience and debugging all services that run within the K8s ecosystem, including Istio service mesh SRE mentality (SLI, SLO & SLA) using Observability, Logging, Monitoring & Alerting (Dynatrace) Ideally coming from a software engineering or exceptional scripting skill background and have moved into SRE/DevOps while gaining a wider understanding of application ecosystems. More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Site Reliability Engineer

London, United Kingdom
AudioStack
creation by building world-class audio infrastructure for our customers. As a Site Reliability Engineer, you'll play a key role in improving our platform's developer operations, including observability, monitoring, and overall reliability. You will be part of a cross-functional team dedicated to implementing robust DevOps practices and enhancing infrastructure and site reliability engineering (SRE). A customer … focused mindset is essential, as the team collaborates closely with stakeholders to ensure solutions meet business and user needs. In addition to a focus on observability, you will contribute hands-on by developing features, automating workflows, and supporting the deployment of advanced machine-learning models. Strong communication skills are vital for working effectively with engineers, product teams, and stakeholders across … about CI/CD to these engineers Identifying and resolving security issues Automating tests and supporting our engineers on building great software Minimum qualifications: Strong experience with monitoring/observability tools (Grafana, Prometheus, or similar) Proficiency in Python, Docker, Kubernetes, and CI/CD pipelines Hands-on cloud experience (AWS or similar) A passion for designing and implementing scalable observability More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Senior Site Reliability Engineer

London, United Kingdom
Hybrid / WFH Options
NinjaOne, LLC
SRE team in the Platform Engineering organization and help us scale our products to millions of end-users. We are looking for individuals with a passion for automation and observability, ensuring the quality and availability of our services. Location - We are flexible on remote working from home, if you are based in the UK or Germany. On Call Requirements - Participate … our 24x7 on-call rotation, SCRUM, and deployment planning Perform Root Cause Analysis (RCA) and provide recommendations for application teams Improve availability and reduce customer impact using Industry best observability tools Ensure best-practice and security-minded architecture by influencing design decisions Create and maintain technical documentation and SOP's Develop software, scripts, or tooling to improve efficiency and reduce … experience in Site Reliability Engineer roles 3+ years' experience with an object-oriented language (preferably Java, .NET or C++) Expert+ level Linux administration, scripting, and troubleshooting Demonstrable knowledge of Observability tools (New Relic, Splunk, DataDog) Comprehensive experience with AWS (Amazon Web Services) and its core capabilities (VPC, EC2, ECS, Route53, Fargate, ALB/NLB distributions, etc) Extensive experience with cloud More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Senior Data Platform Engineer - Azure (London)

London, UK
Hybrid / WFH Options
Bupa
as-code using Terraform, ensuring consistency and reusability across environments. Build and optimize CI/CD pipelines using Azure DevOps and GitHub Actions to support rapid, reliable deployments. Implement observability practices including logging, metrics, and alerting using observability tools. Collaborate with the Lead Engineer and Architects to align implementation with platform standards and patterns. Provide technical guidance and mentorship to … Azure ML, and Power BI/Fabric. Proven experiencewith infrastructure-as-code using Terraform and building CI/CD pipelines via Azure DevOps and GitHub Actions. Strong grasp of observability practices, including logging, metrics, alerting, and performance optimization. Deep understanding of cloud security, with experience applying secure-by-design principles in Azure (e.g., network isolation, IAM, data protection). Proficiency More ❯
Employment Type: Full-time
Posted:

Global IT Network Senior Director

London, United Kingdom
Boston Consulting Group
automation. Ensure end-to-end network automation to improve operational efficiency, agility, and reliability. Drive zero-trust network security principles, ensuring compliance and proactive threat mitigation. Establish a global observability and telemetry framework for real-time network insights. Align network strategies with business growth, cloud-first initiatives, and digital transformation. Network Infrastructure & Cloud Networking: Oversee global network architecture, spanning data … response using AI-driven network analytics. Ensure high availability, network resilience, and 24x7 operational support. Develop a follow-the-sun support model, ensuring global network performance optimization. Implement network observability and predictive analytics to proactively prevent outages. Security, Compliance & Risk Management: Drive zero-trust security frameworks, ensuring secure and resilient network access. Ensure adherence to ISO 27001, NIST, SOC … role, managing large-scale global network environments. Deep expertise in cloud networking (AWS, Azure, GCP), SD-WAN, and network automation. Proven track record in end-to-end network automation, observability, and self-healing networks. Experience in AI-driven networking, predictive analytics, and network telemetry. Strong understanding of zero-trust networking, compliance frameworks, and security policies. Excellent leadership, communication, and stakeholder More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Senior Data Platform Engineer - Azure

London, United Kingdom
Hybrid / WFH Options
Bupa
as-code using Terraform, ensuring consistency and reusability across environments. Build and optimize CI/CD pipelines using Azure DevOps and GitHub Actions to support rapid, reliable deployments. Implement observability practices including logging, metrics, and alerting using observability tools. Collaborate with the Lead Engineer and Architects to align implementation with platform standards and patterns. Provide technical guidance and mentorship to … Azure ML, and Power BI/Fabric. Proven experiencewith infrastructure-as-code using Terraform and building CI/CD pipelines via Azure DevOps and GitHub Actions. Strong grasp of observability practices, including logging, metrics, alerting, and performance optimization. Deep understanding of cloud security, with experience applying secure-by-design principles in Azure (e.g., network isolation, IAM, data protection). Proficiency More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Senior Data Platform Engineer - Azure (London)

London, UK
Hybrid / WFH Options
Bupa
as-code using Terraform, ensuring consistency and reusability across environments. Build and optimize CI/CD pipelines using Azure DevOps and GitHub Actions to support rapid, reliable deployments. Implement observability practices including logging, metrics, and alerting using observability tools. Collaborate with the Lead Engineer and Architects to align implementation with platform standards and patterns. Provide technical guidance and mentorship to … Azure ML, and Power BI/Fabric. Proven experiencewith infrastructure-as-code using Terraform and building CI/CD pipelines via Azure DevOps and GitHub Actions. Strong grasp of observability practices, including logging, metrics, alerting, and performance optimization. Deep understanding of cloud security, with experience applying secure-by-design principles in Azure (e.g., network isolation, IAM, data protection). Proficiency More ❯
Employment Type: Full-time
Posted:

Director of Rates and Credit Reliability Engineering (London)

London, UK
Hybrid / WFH Options
Deutsche Bank
strategy across FIC Technology, aligning reliability goals with business priorities and regulatory expectations Lead the transformation of production support into a proactive, data-driven engineering discipline focused on automation, observability, and continuous improvement Stay close to the technology—reviewing architecture, contributing to tooling, and leading by example in incident response and root cause analysis Act as a trusted advisor to … proficiency in Linux/Unix systems, SQL, and programming languages such as C++, Java or Python. Strong understanding of distributed systems and low-latency architectures Hands-on experience with observability stacks (e.g., Prometheus, Grafana, Splunk, Geneos, OpenTelemetry) and infrastructure automation (e.g., Ansible, Terraform, CI/CD pipelines) Strong understanding of the trade lifecycle, market data, and fixed income products, FX More ❯
Employment Type: Full-time
Posted:
Observability
London
10th Percentile
£65,000
25th Percentile
£73,125
Median
£82,500
75th Percentile
£108,125
90th Percentile
£120,000