Observability Jobs in London

126 to 150 of 159 Observability Jobs in London

Data Platform Engineer

London, United Kingdom
Hybrid / WFH Options
Lyst
able to: Contribute to every part of our system, ranging from code and tests to infrastructure changes. Ensure the stability of our system by implementing and improving monitoring and observability tools. Write resilient code that is well tested. Be curious - not just the code, but the architecture of our platforms and everything that enables the business to thrive. Gain expertise … the rest of the organisation, and almost all of Lyst engineering engages with us on a regular basis. We care about robustness and integrity in our pipelines and use observability tools to monitor. Experience in developing robust and secure software solutions and data pipelines. Effective communication skills, comfortable working with technical and non-technical individuals and teams. Proficiency in developing … within public cloud technologies and architecture (perferably AWS exp). Experience with containers (Docker) and container orchastration. Experience with Infrastructure as Code (we use Terraform). Experience utilising monitoring, observability and logging tools. Experience with git, gitOps, github actions. Exposure or experience with cloud data warehouse/data platforms (we useSnowflake). Things that matter to us: You have a More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Director of Software Asset and Configuration Management

London, United Kingdom
Boston Consulting Group
for the creation, implementation, and continuous improvement of BCG's modern, fully automated SACM function. As the beating heart of IT, this system will serve as the backbone for observability, service reliability, release and change management, and infrastructure management. The leader will drive the automation and governance of BCG's configuration management database (CMDB), integrating it with SRE, ITSM, and … Establish the CMDB as a real-time, trusted system of record for configuration items across cloud, on-prem, and hybrid environments. Embed SACM capabilities into core IT processes including observability, incident response, service management, and architecture governance. Champion automation, transparency, and traceability of all infrastructure, software, and asset relationships. Automation & Integration: Build and operate a fully automated CMDB with bi … reduce risk and accelerate safe deployments. Operational Excellence & SRE Alignment: Apply SRE principles to ensure reliability, performance, and resilience of the SACM platform. Embed SACM into 24x7 operations and observability platforms to support real-time decision-making. Support incident prevention, root cause analysis, and continuous improvement through data-driven insights. Define and enforce service level objectives (SLOs) and key performance More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Director of AWS Platforms

London, United Kingdom
Boston Consulting Group
for the creation, implementation, and continuous improvement of BCG's modern, fully automated SACM function. As the beating heart of IT, this system will serve as the backbone for observability, service reliability, release and change management, and infrastructure management. The leader will drive the automation and governance of BCG's configuration management database (CMDB), integrating it with SRE, ITSM, and … Establish the CMDB as a real-time, trusted system of record for configuration items across cloud, on-prem, and hybrid environments. Embed SACM capabilities into core IT processes including observability, incident response, service management, and architecture governance. Champion automation, transparency, and traceability of all infrastructure, software, and asset relationships. Automation & Integration: Build and operate a fully automated CMDB with bi … reduce risk and accelerate safe deployments. Operational Excellence & SRE Alignment: Apply SRE principles to ensure reliability, performance, and resilience of the SACM platform. Embed SACM into 24x7 operations and observability platforms to support real-time decision-making. Support incident prevention, root cause analysis, and continuous improvement through data-driven insights. Define and enforce service level objectives (SLOs) and key performance More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Principal AWS Platform Engineer

London, United Kingdom
CACI Limited
at scale, leveraging AWS Organizations, Landing Zones, and multi-account best practices. Develop and maintain Infrastructure as Code solutions using Terraform, CloudFormation, and AWS CDK. Champion security, compliance, and observability by integrating services like AWS Security Hub, GuardDuty, and Inspector. Design CI/CD pipelines to enable seamless deployments and self-service models for customers. Innovate with AWS Networking, KMS … Proficiency in Python, Go, or similar languages for automation and scripting. Expert-level knowledge of AWS Networking, TLS, and security best practices. Experience with container orchestration (Kubernetes, EKS) and observability tools (Grafana, ELK). A passion for innovation, problem-solving, and delivering high-impact solutions. Experience leading/managing junior engineers Significant experience with Control Tower and deploying landing zones. More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Principal AWS Platform Engineer

London, United Kingdom
Hybrid / WFH Options
Identity E2E Ltd
at scale, leveraging AWS Organizations, Landing Zones, and multi-account best practices. Develop and maintain Infrastructure as Code solutions using Terraform, CloudFormation, and AWS CDK. Champion security, compliance, and observability by integrating services like AWS Security Hub, GuardDuty, and Inspector. Design CI/CD pipelines to enable seamless deployments and self-service models for customers. Innovate with AWS Networking, KMS … architectures and multi-account AWS setups. Extensive experience with AWS Organisations Expert-level knowledge of AWS Networking, TLS, and security best practices. Experience with container orchestration (Kubernetes, EKS) and observability tools (Grafana, ELK). A passion for innovation, problem-solving, and delivering high-impact solutions. Working with Control Tower and Landing Zones Why Work For Us? Competitive base salary up More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Lead Developer

London, United Kingdom
Hybrid / WFH Options
Experis
Lead Developer 6 Months Hybrid -1/3 days a month in office, either London or Bristol £750 Overview: Working within an agile digital delivery team developing and supporting a mission critical application for the UK client , with instances hosted More ❯
Employment Type: Contract
Rate: £600 - £750/day
Posted:

Senior Java Engineer - B2B SaaS Product Engineering

London, United Kingdom
Burns Sheehan
Senior Java Engineer - Product Engineering B2B SaaS Insurtech Up to £110,000 per annum plus bonus and excellent pension London - 2 days a week Java Spring Boot AWS Kubernetes Event-Driven Architecture Senior Java Engineer - We have been exclusively engaged More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Data Science Tech Lead: GenAI

London Area, United Kingdom
Hybrid / WFH Options
Anecdote
up and harden RAG pipelines (indexing, retrieval policies, grounding, guardrails) and agent frameworks. Take basic infra ownership on GCP (or AWS/Azure): networking, autoscaling, CI/CD, IaC, observability, and cost tuning. Participate in on‐call for your area and drive root‐cause analysis with crisp follow‐ups. 15% Collaborate Pair with back‐end & front‐end to wire extractors … evals; hands‐on with time‐series analysis (forecasting, change‐point, drift). Cloud & ops: Basic infra ownership on GCP (or AWS/Azure): networking, autoscaling, CI/CD, IaC, observability, and cost control. Communication: You explain results clearly, align stakeholders, and write crisp docs. Bonus points DevOps wizardry; GPU/accelerator experience. Multimodal pipelines (text + voice + screenshots). More ❯
Posted:

Data Science Tech Lead: GenAI

City of London, London, United Kingdom
Hybrid / WFH Options
Anecdote
up and harden RAG pipelines (indexing, retrieval policies, grounding, guardrails) and agent frameworks. Take basic infra ownership on GCP (or AWS/Azure): networking, autoscaling, CI/CD, IaC, observability, and cost tuning. Participate in on‐call for your area and drive root‐cause analysis with crisp follow‐ups. 15% Collaborate Pair with back‐end & front‐end to wire extractors … evals; hands‐on with time‐series analysis (forecasting, change‐point, drift). Cloud & ops: Basic infra ownership on GCP (or AWS/Azure): networking, autoscaling, CI/CD, IaC, observability, and cost control. Communication: You explain results clearly, align stakeholders, and write crisp docs. Bonus points DevOps wizardry; GPU/accelerator experience. Multimodal pipelines (text + voice + screenshots). More ❯
Posted:

Data Science Tech Lead: GenAI

london, south east england, united kingdom
Hybrid / WFH Options
Anecdote
up and harden RAG pipelines (indexing, retrieval policies, grounding, guardrails) and agent frameworks. Take basic infra ownership on GCP (or AWS/Azure): networking, autoscaling, CI/CD, IaC, observability, and cost tuning. Participate in on‐call for your area and drive root‐cause analysis with crisp follow‐ups. 15% Collaborate Pair with back‐end & front‐end to wire extractors … evals; hands‐on with time‐series analysis (forecasting, change‐point, drift). Cloud & ops: Basic infra ownership on GCP (or AWS/Azure): networking, autoscaling, CI/CD, IaC, observability, and cost control. Communication: You explain results clearly, align stakeholders, and write crisp docs. Bonus points DevOps wizardry; GPU/accelerator experience. Multimodal pipelines (text + voice + screenshots). More ❯
Posted:

Data Science Tech Lead: GenAI

london (city of london), south east england, united kingdom
Hybrid / WFH Options
Anecdote
up and harden RAG pipelines (indexing, retrieval policies, grounding, guardrails) and agent frameworks. Take basic infra ownership on GCP (or AWS/Azure): networking, autoscaling, CI/CD, IaC, observability, and cost tuning. Participate in on‐call for your area and drive root‐cause analysis with crisp follow‐ups. 15% Collaborate Pair with back‐end & front‐end to wire extractors … evals; hands‐on with time‐series analysis (forecasting, change‐point, drift). Cloud & ops: Basic infra ownership on GCP (or AWS/Azure): networking, autoscaling, CI/CD, IaC, observability, and cost control. Communication: You explain results clearly, align stakeholders, and write crisp docs. Bonus points DevOps wizardry; GPU/accelerator experience. Multimodal pipelines (text + voice + screenshots). More ❯
Posted:

Splunk ITSI Expert / Observability Engineer (Level 4)

City of London, London, United Kingdom
Randstad Technologies Recruitment
We are seeking a highly experienced Splunk ITSI Expert with 10+ years in observability to enhance our monitoring and analytics capabilities. Key Responsibilities: Design and implement advanced monitoring strategies using Splunk IT Service Intelligence (ITSI). Create service models, define KPIs, and build glass tables to visualize key business services. Utilize Splunk ES for security event monitoring and correlation searches. … Automate tasks and integrate systems using Python, Shell, or Perl scripting. Perform root cause analysis and anomaly detection by analyzing complex log data. Requirements: 10+ years experience in observability, with deep expertise in Splunk, especially ITSI. Proficiency in Scripting (Shell/PowerShell/Python). Strong understanding of Load Balancers such as F5, Netscaler, and AWS ELB. Hands-on experience More ❯
Employment Type: Contract
Rate: £300 - £380/day
Posted:

Lead DevOps Engineer (Data)

London, United Kingdom
Hybrid / WFH Options
LGBT Great
a key role in scaling and supporting our data systems, which leverage a modern AWS stack and Snowflake. This is a high-impact role with direct influence over reliability, observability, and the DevOps maturity of our data engineering function. Key Responsibilities Platform Ownership Own and manage the data platform infrastructure built on AWS services (EventBridge, Lambda, EC2, MWAA, S3). … Snowflake, and support its integration into the broader data ecosystem. Infrastructure and System Reliability Ensure platform reliability, availability, and scalability across environments. Design and maintain robust monitoring, alerting, and observability frameworks to reduce MTTR and improve visibility. Lead and manage initiatives related to data lineage, platform health, and alert hygiene. CI/CD and Automation Enhance and expand our CI … and operating production data platforms within AWS. Strong understanding of AWS core services: EventBridge, Lambda, EC2, S3, and MWAA (Managed Workflows for Apache Airflow). Experience with infrastructure reliability, observability tooling, and platform automation. Solid experience with CI/CD pipelines, preferably Bitbucket Pipelines. Familiarity with Snowflake administration and deployment practices. Comfortable working through ambiguity and in cross-functional, collaborative More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Staff Quality Engineer (Waitrose Apps)

South West London, London, United Kingdom
Hybrid / WFH Options
John Lewis & Partners
the teams checks, your role in the team will be to mentor others in testing practice; coach them to adopt and improve their quality approaches including deployment approaches and observability; review and contribute to the teams codebase and pipeline configuration; help the team with their system of work from first business need to monitoring services in production. At all times … performance, resource usage, variable bandwidth, device compatibility, accessibility etc.) and advising on how these risks should be mitigated. Understanding operational and non-functional requirements (such as resilience, performance and observability) and how solutions are implemented and tested. Desirable skills/experience you may have Bitrise/Gitlab CI GraphQL Backend for Frontend (BFF) patterns Microservice Architectures Experience of cloud infrastructure More ❯
Employment Type: Permanent, Work From Home
Salary: £90,000
Posted:

Machine Learning Engineer

London, United Kingdom
Hybrid / WFH Options
Ravelin
Who are we? Hi! We are Ravelin! We're a fraud detection company using advanced machine learning and network analysis technology to solve big problems. Our goal is to make online transactions safer and help our clients feel confident serving More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Senior Applied Scientist, Insights, Prime Video

London, United Kingdom
Amazon
Prime Video wide processes. - Communicate results and insights to both technical and non-technical audiences, including through presentations and written reports. About the team Our team owns Prime Video observability features for development teams. We consume PBs of logs daily which feed into multiple observability features focussed on reducing the customer impact time. In 2025, we are expanding our remit … to consume data from more sources to provide more holistic observability for our development teams. BASIC QUALIFICATIONS - Master's degree in engineering, technology, computer science, machine learning, robotics, operations research, statistics, mathematics or equivalent quantitative field - Experience programming in Java, C++, Python or related language - Experience with neural deep learning methods and machine learning - Experience in building machine learning models More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Product Quality and Support Strategist, Alerting

London, United Kingdom
Coralogix, inc
Product Quality and Support Strategist, Alerting About The Position Coralogix is a modern, full-stack observability platform transforming how businesses process and understand their data. Our unique architecture powers in-stream analytics without reliance on expensive indexing or hot storage. We specialize in comprehensive monitoring of logs, metrics, traces, and security events with features such as APM, RUM, SIEM, Kubernetes … monitoring, and more, enhancing operational efficiency and reducing observability spending by up to 70%. We seek a Quality and Support Strategist professional who ensures that the Coralogix Alerting and Incident Management Platform and Process exceed the quality and reliability standards, establish a competitive edge, and prevent failures, profit loss, or work stoppages. You will be responsible for enhancing customer … team members are encouraged to challenge the status quo and contribute to our shared mission. If you thrive in dynamic environments and are eager to shape the future of observability solutions, we'd love to hear from you. Coralogix is an equal opportunity employer and encourages applicants from all backgrounds to apply. More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Senior AI Engineer

London, United Kingdom
Colossyan
to deployment and monitoring, balancing cutting-edge techniques with pragmatism to deliver measurable impact. • Apply strong software engineering principles, such as modularity, testing, code reviews, CI/CD and observability, to ensure AI systems are reliable, maintainable, production-ready and can be readily adapted to future developments. • Choose the right approach for the problem at hand, evaluating classical ML and … focused teams, collaborating with designers, engineers, and PMs, to scope and ship AI features iteratively • Ability to reason about system behavior end-to-end, including model performance, latency, and observability, and how these impact user experience. • Clear, structured communicator, comfortable documenting and defending architectural decisions and engaging in thoughtful technical debate. Not required, but it's a plus if you More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Lead Full Stack Engineer

London, United Kingdom
Hybrid / WFH Options
Fruition Group
Lead Full Stack Engineer London (Hybrid 1x Per week) Salary: Up to £100k + Benefits About Us Our client is a Insurtech Unicorn looking to expand their engineering team. With the opportunity to work on existing products which have generated More ❯
Employment Type: Permanent, Work From Home
Posted:

AWS Engineer (OpenSearch)

London, United Kingdom
Hybrid / WFH Options
Ascendion
Job Title: AWS Engineer (OpenSearch) Work Location: London, UK (Hybrid) Job Description: We are looking for an experienced AWS Observability Engineer specializing in the Elasticsearch (ELS) Stack to design, implement, and optimize observability solutions across our cloud infrastructure. The ideal candidate will have hands-on experience in building OpenSearch infrastructure . Expertise in AWS services, log management, metric monitoring, and …/Opensearch infrastructure, Configuring Platform and applications to send logs to Open Search. Expertise in the ELS Stack (Elasticsearch, Logstash, Kibana). Designing centralized logging, building Kibana dashboards, optimizing observability with AWS services (CloudWatch, CloudTrail, S3), and automating workflows using Python or Terraform. Configure platforms/applications to forward logs to OpenSearch. Optimize log ingestion pipelines and troubleshoot issues. Collaborate More ❯
Employment Type: Permanent
Posted:

Elastic Engineer

London, United Kingdom
Hybrid / WFH Options
Experis
to support low latency applications. * Practical and working knowledge of IP networking and data flow within distributed systems. * Experience integrating ELK with packet capture/analysis tools to enhance observability of real-time systems. * Hands-on expertise in data pipeline creation, ingestion strategies, and performance tuning of Logstash and Beats for scalable telemetry. * Knowledge of Corvil and/or Pico … a deep understanding of the demands of high-frequency and algorithmic trading environments. * Working knowledge of security best practices, including RBAC, TLS, and audit logging in Elasticsearch. * Experience with observability platforms such as ITRS Geneos and their integration with ELK is a strong plus. * Comfortable with multi-site replication, cross-cluster search, and disaster recovery configurations for global deployment. * Strong … understanding of Linux systems, containers, and cloud-native observability stacks. * Organized, self-driven, and able to manage priorities in a dynamic, high-performance trading environment. More ❯
Employment Type: Contract
Rate: £400 - £430/day
Posted:

Dynatrace Subject Matter Expert - Data Resilience

London, United Kingdom
Pontoon
with the Enterprise Monitoring & Alerting (EMAS) team to deliver a transformative initiative aimed at maximising Dynatrace capabilities. We are looking for a skilled Dynatrace Admin/Consultant to enable observability across complex, hybrid cloud environments. Essential Skills: To thrive in this role, you must demonstrate extensive experience in designing and configuring within Dynatrace, including: Application Performance Monitoring Proficiency in Dynatrace … assets for monitoring. Work with EMAS to analyse Dynatrace coverage of these critical assets. Identify opportunities for enhancement in monitoring configurations across crucial applications. Review roles and responsibilities concerning observability and propose improvements focused on Operational Resilience. Contribute to establishing an automated end-to-end business flow for key business processes within the Dynatrace toolset. Ensure optimal alerting configurations in … collaboration with Application Stewards and SREs. Participate in workshops with third-party software suppliers to review observability standards. Bonus Points: Skills in correlating events across the full stack for root cause analysis. Key Attributes: Ability to manage competing priorities in a fast-paced environment. Flexibility and a pragmatic approach to problem-solving. A delivery-oriented mindset coupled with a can More ❯
Employment Type: Contract
Posted:

SRE & Service Lead

London, United Kingdom
Sanderson Recruitment
+ Bonus Are you a forward-thinking Engineering Leader with a deep understanding of software engineering, cloud infrastructure, and SRE principles? Do you have a sharp eye for automation, observability, and leading technical teams through digital transformation at scale? If so, this could be the perfect opportunity to elevate your career at the forefront of banking innovation. This is a … Experience building and leading teams of SREs and Engineers - both onshore and offshore Expertise in cloud migration to AWS , distributed architecture, and open telemetry Strong exposure to SRE practices , observability frameworks, and automated monitoring Confident managing vulnerability , resiliency , and chaos engineering practices Track record of transforming large-scale platforms and delivering customer-centric tech Comfortable getting close to the code … hands-on involvement expected) What you'll be doing: Leading platform design, observability strategy, and automation-first service delivery Overseeing hosting migration of the Mortgages portfolio to the cloud Owning vulnerability and incident response models across critical services Collaborating with Architecture, Security, and Engineering Leaders to shape roadmaps Building and growing a high-performing team across critical banking domains Ensuring More ❯
Employment Type: Permanent
Posted:

Engineering Manager - London (Hybrid) - AWS TypeScript

London, United Kingdom
Hybrid / WFH Options
Transparent Technology
this role combines technical leadership with hands-on engineering across a modern stack: AWS (Lambda, Step Functions, DynamoDB, Postgres, CDK), TypeScript, React, Next.js, Jest, Playwright, CI/CD and observability tools. You'll help establish and grow the Core Services team, building scalable architecture, developer tooling, and platform services that power multiple product squads. Expect around 70-80% coding and … Manager, Technical Lead, Senior Full Stack Engineer, Staff Engineer, AWS Lambda, AWS Step Functions, AWS DynamoDB, AWS Postgres, AWS CDK, TypeScript, React, Next.js, Jest, Playwright, CI/CD, DevOps, Observability, Monitoring, SaaS, HRTech, Hybrid Jobs London, Remote, Scale-up, Scalable Architecture. More ❯
Employment Type: Permanent, Work From Home
Posted:

Senior Software Engineer Software Engineering

London, United Kingdom
Hybrid / WFH Options
Unlikely Artificial Intelligence Limited
deliver impact. Write clean, testable, and maintainable code with a focus on developer-driven quality. Apply strong CS fundamentals to design scalable, reliable, and efficient systems. Contribute to monitoring, observability, and performance optimisation of production systems. Work closely with multidisciplinary teams, sharing knowledge and solving problems collaboratively. Adapt quickly to changing priorities, delivering high-quality results at pace. What We … at scale. Full-stack experience or deep expertise in backend development. Strong CS fundamentals and fluency in Python. Experience owning and operating complex systems in production. Clear grasp of observability, monitoring, and performance tuning. Commitment to writing high-quality, testable code and improving engineering practices. Relevant degree (e.g. Computer Science, Mathematics, Engineering or similar). Pragmatic, collaborative mindset with strong More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:
Observability
London
10th Percentile
£65,000
25th Percentile
£73,750
Median
£90,000
75th Percentile
£115,000
90th Percentile
£132,000