376 to 400 of 418 Observability Jobs in London

Staff Reliability Engineer (Full Stack)

Hiring Organisation
Feeld
Location
Greater London, United Kingdom
Employment Type
Full Time
Salary
100000 to 130000 GBP Annually
Native). Lead technical problem-solving during incidents: coordinate response, diagnose root causes, communicate status, and drive to resolution. Build and evolve monitoring/observability (dashboards, alerts, tracing, logging) that enables fast detection and diagnosis. Drive post‐incident reviews (blameless) and ensure learnings become durable fixes (tech changes, runbooks, automation … comfort working across services and APIs. Proven incident response leadership: on-call participation, triage, mitigation, and root-cause analysis (RCA) with follow-through. Solid observability skills: practical experience with logging/metrics/tracing and turning signals into actionable alerts and dashboards. Experience collaborating with mobile teams and understanding mobile ...

Application Support - Commodities Trading Firm - Up to £70k + Bonus - Hybrid

Hiring Organisation
Saragossa
Location
London Area, United Kingdom
Europe's leading commodity trading firms, and gain exposure to a technology landscape that's constantly evolving. From cloud platforms and Kubernetes to automation, observability, databases, and emerging AI initiatives, you'll be surrounded by the kind of technology and business exposure that can significantly accelerate your career. Your … years’ application support experience within financial services or another regulated environment. You’ll be technically confident with strong SQL skills, experience using monitoring and observability tools, and exposure to scripting or automation (such as Python or PowerShell). Any interest in or experience with AI would be a strong plus. ...

Senior Backend / Full-Stack Engineer (E5/E6 Level) – AI-Native Startup – Strong Comp + Equity

Hiring Organisation
Mondrian Alpha
Location
London Area, United Kingdom
preferred • Experience with modern frontend frameworks (React/Next.js) is a plus for full-stack candidates • Strong understanding of system design, reliability, scalability, and observability • Experience in startups or fast-paced product environments is highly desirable • AI-native mindset — comfortable leveraging AI tooling and rapid iteration workflows • Strong communication skills … React, Next.js • AI-native workflows and internal LLM tooling • Distributed systems and real-time infrastructure • OpenSearch, SingleStore, Trigger.dev, Axiom • Modern cloud-native infrastructure and observability stack What they offer: • Excellent compensation + meaningful equity • High-ownership environment with direct impact on product and architecture • Small, elite engineering team • Direct collaboration ...

Staff Engineer

Hiring Organisation
Stepstone UK
Location
South East London, London, United Kingdom
Employment Type
Permanent
engineering standards, best practices and reusable patterns while partnering with Enterprise Architecture and influencing technical direction Drive engineering excellence by improving code quality, testing, observability, reliability and operational practices Support end-to-end delivery by guiding teams through complex technical challenges, improving decision-making, and contributing to planning and risk … data lakes/lakehouse architectures, Iceberg or similar table formats, as well as batch and streaming processing Knowledge of data quality, governance, cataloguing and observability tools (e.g. Datadog), with DBT or AI-assisted engineering practices as a plus Additional Information Your benefits Werea community here that cares as much about ...

Senior Network Engineer, Cingularity

Hiring Organisation
IMG
Location
London Area, United Kingdom
specialised DTM (Dynamic Synchronous Transfer Mode) network—while strategically introducing automation to enhance resilience and performance. While you will assist with the development of observability tools, 24/7 monitoring is managed by our Technical Operations Centre (TOC), supported by a joint effort between Systems, Broadcast, and Network Engineering teams. … transmission paths. NetOps & Monitoring Refinement Internal Tooling: Build and refine monitoring techniques where the primary "customers" are our internal TOC and Event Engineering teams. Observability Design: Utilise and assist in the development of modern monitoring and logging systems (e.g., Prometheus, Grafana, ELK/OpenSearch) and the Netbox source of truth. ...

Group Head of Engineering

Hiring Organisation
Jobleads-UK
Location
Greater London, England, United Kingdom
club platforms, aligned to transformation priorities Set clear architectural direction and embed modern engineering standards (cloud-first, CI/CD, automated testing, observability, secure SDLC) Own end‐to‐end delivery outcomes, ensuring valuable increments are shipped frequently, safely, and predictably Drive operational excellence across reliability, resilience, performance, and security Establish … continuous improvement Experience Senior engineering leader with strong hands‐on technical credentials. Deep experience across cloud-first architectures, distributed systems, CI/CD, observability, and secure SDLC. Experience delivering AI-enabled capabilities into production environments. Proven track record of improving reliability and leading incident response and prevention. Experience scaling engineering ...

Principal Machine Learning Engineer

Hiring Organisation
Jobleads-UK
Location
Greater London, England, United Kingdom
Anaplan\'s platform and third-party integrations Optimise model inference pipelines for performance, cost, and scalability in production environments Implement monitoring, logging, and observability for GenAI systems to track usage, errors, and model behaviour Collaborate with data scientists to productionise ML models and forecasting algorithms Your Skills Extensive hands … Experience with A/B testing and experimentation frameworks for AI features Contributions to open-source ML projects or research publications Experience with model observability tools (LangSmith, W&B;, MLflow) DEIB Our Commitment to Diversity, Equity, Inclusionand Belonging (DEIB) We believe attracting and retaining the best talent and fostering ...

Endpoint Engineer

Hiring Organisation
Fitch Group
Location
Greater London, United Kingdom
Employment Type
Full Time
virtual devices. The team delivers advanced engineering and management capabilities that ensure secure, reliable, and frictionless user experiences through modern device management, automation, observability, and Zero Trust principles. How You’ll Make an Impact: Deliver advanced incident and problem resolution with a strong focus on user experience and root‐cause … Deployment Toolkit (PSADT). Strong PowerShell automation expertise and familiarity with Git‐based workflows. Experience with vulnerability management tools and endpoint telemetry/observability platforms. Microsoft certifications related to Endpoint, Azure, or Security. Why Choose Fitch: Hybrid Work Environment: 2 to 3 days a week in office required based ...

Staff Frontend Engineer (React Native / Mobile)

Hiring Organisation
Feeld
Location
Greater London, United Kingdom
Employment Type
Full Time
Salary
80000 to 110000 GBP Annually
performance, stability/crash rates, startup time, build/release velocity, or app size . Increased confidence in production through better observability, incident response practices, and ownership . Enabled other FE engineers to move faster through documentation, pairing/mentorship, reviews, and reusable platform components . What … traffic/user counts, complex feature sets) with a focus on reliability and performance. Demonstrated production ownership : incident response, debugging complex issues, and improving observability (metrics/logs/traces). Experience improving delivery systems (CI/CD, automated testing strategy, release process) and keeping teams moving. Staff-level ...

Head of Analytics Engineering

Hiring Organisation
Jobleads-UK
Location
Greater London, England, United Kingdom
LSEG is seeking a Head of Analytics Engineering to lead the strategy, design, and delivery of Analytics products and capabilities underpinning our Cross Asset, Fund, Private Markets and Predictive Analytics offerings, reporting to the Data ...

MuleSoft Architect

Hiring Organisation
17918
Location
Westminster, West End, United Kingdom
Purpose of the role: Capita's MuleSoft Centre for Enablement & Excellence (C4EE) is expanding its architectural capability to support large-scale integration and API transformation programmes for major UK Government and private-sector clients. As ...

Junior Java Developer

Hiring Organisation
Global
Location
Greater London, United Kingdom
Employment Type
Full Time
This job is with Global, an inclusive employer and a member of myGwork – the largest global platform for the LGBTQ+ business community. Please do not contact the recruiter directly. Accepting applications until: 31 July 2026 ...

MuleSoft Architect

Hiring Organisation
Capita Shared Services Limited
Location
West London, London, United Kingdom
Employment Type
Permanent
Purpose of the role: Capita's MuleSoft Centre for Enablement & Excellence (C4EE) is expanding its architectural capability to support large-scale integration and API transformation programmes for major UK Government and private-sector clients. As ...

Engineering Manager - Networking & Observability

Hiring Organisation
Jobleads-UK
Location
Greater London, England, United Kingdom
Manager to lead and mentor a talented team. This role involves driving development for innovative networking in the Cilium project, focusing on security and observability solutions. Candidates should have a solid technical foundation in software engineering, significant management experience, and familiarity with Linux networking. The position offers a dynamic environment ...

Principal AI Observability Solutions Engineer

Hiring Organisation
Jobleads-UK
Location
Greater London, England, United Kingdom
products, and supporting the sales process with technical know-how. The ideal candidate will have over 10 years of experience, proficiency in modern Observability tools and cloud technologies, and excellent communication skills. This position plays a critical role in helping Snowflake redefine technology deployment. #J-18808-Ljbffr ...

Application Support SRE

Hiring Organisation
KBC Technologies Group
Location
City of London, London, United Kingdom
components (compute, network, storage). Database and performance issues. Collaborate with engineering, infrastructure and other technical teams to isolate and resolve issues efficiently. Monitoring & Observability Improve system health monitoring using observability tools and alerts. Identify gaps in alerting and contribute to improving quality of alerting and dashboards. Ensure proactive detection … anomalies using observability tools. Automation & Process Improvement Contribute to automation initiatives to reduce toil and errors. Identify repetitive operational tasks and drive improvements. Support implementation of DevOps best practices. Leverage AI-driven tools to improve monitoring, incident detection, and operational efficiency, enabling faster troubleshooting and reduced manual effort ...

Tetragon Engineering Manager - Isovalent

Hiring Organisation
Jobleads-UK
Location
Greater London, England, United Kingdom
passionate engineers. The Engineering Manager will play a pivotal role in driving the development of our innovative networking for the Cilium project, security and observability solutions. The ideal candidate will have a strong technical background in software engineering, excellent leadership skills, a proven track record of delivering high‐quality software … Rust (Tetragon is primarily written in Go) Experience in any of these areas: Linux systems, Kernel-level development BPF or eBPF programming for observability or security use cases Preferred Qualifications Experience integrating with observability platforms (e.g., Splunk) or SaaS security/analytics systems. Knowledge of eBPF for observability or security ...

Senior Product Manager

Hiring Organisation
Jobleads-UK
Location
Greater London, England, United Kingdom
About ITRS At ITRS, we make society's critical technology work. Our mission is to deliver automated and holistic IT observability solutions that safeguard critical applications and enable innovation. We are the only monitoring and observability platform designed for the most demanding and regulated industries — trusted by 90% of Tier … trading resilience and Market Data Observability. These workstreams sit at the heart of the Geneos and ITRS Analytics (IAX) product line, a monitoring and observability platform used by 90% of Tier 1 capital markets firms tp ensure resilience of low-latency trading, core banking, payments, and market data infrastructure. This ...

AI Engineer

Hiring Organisation
McCabe & Barton
Location
City of London, London, United Kingdom
Employment Type
Contract
Contract Rate
Up to £800 per day
ROLE Design and build core AI platform components for a leading buy-side investor. You'll own the LLM gateway, MCP connector layer, observability tooling, and privacy proxy translating business use cases into governed, production-ready AI systems . WHAT YOU'LL DO Design and maintain LLM gateway, MCP connector … layer, observability, and privacy proxy Develop MCP connectors across data sources (M365, Salesforce, Kensho, Moody's, internal systems) Build AI-powered tools and workflows for business teams Integrate Databricks/Unity Catalog as data foundation for AI features Write clear specifications before building; ensure proper test coverage Support prompt engineering ...

AIOps / LLM Operations

Hiring Organisation
Diagonal recruitment
Location
London Area, United Kingdom
remain secure, observable, reliable and effective once in production. Role Overview Establish AI operational processes and standards Monitor model, agent and workflow performance Define observability, incident management and support frameworks Manage AI deployment, release and rollback processes Support security, compliance and operational resilience requirements Drive continuous improvement across AI systems … Ensure cost is controlled Tools & Technologies Required MLflow, Weights & Biases or similar Monitoring and observability platforms CI/CD pipelines Cloud infrastructure Logging and performance management tools Security and governance frameworks Nice to Have Site Reliability Engineering (SRE) background MLOps experience Cloud architecture knowledge Experience supporting regulated environments About ...

Performance and Monitoring Engineer

Hiring Organisation
Solus Accident Repair Centres
Location
North London, London, United Kingdom
Employment Type
Permanent
Salary
£50,000
talented Performance and Monitoring Engineer to help us strengthen the stability, reliability and performance of our systems. If you're passionate about monitoring, observability and using data to proactively improve service health, this is a great opportunity to make a real impact across a large, modern technology estate. Responsibilities … improve speed, accuracy and consistency Supporting major changes, deployments and post-incident reviews with data-driven evidence Qualifications Strong experience with monitoring and observability tools (LogicMonitor, Azure Monitor, App Insights, Log Analytics, Defender for Cloud) Excellent understanding of cloud performance, IaaS/PaaS, networking fundamentals, API performance and capacity modelling ...

AI Engineering Manager

Hiring Organisation
Gravitas Recruitment Group (Global) Ltd
Location
London Area, United Kingdom
business needs into technical deliverables. Drive agentic workflows and AI tooling adoption across the product development lifecycle to deliver tangible value. Establish robust evaluation, observability, and quality practices for AI systems, balancing speed with reliability. Guide teams through ambiguity and rapid change, making pragmatic decisions and removing blockers. Measure success … development. Hands-on experience with AI models, tools, and frameworks, including agent orchestration, prompt engineering, RAG pipelines, evaluation frameworks, LangChain, Codex, Claude, Gemini, and observability tools and best practices. Strong technical problem-solving skills and the ability to guide teams through ambiguous, fast-changing environments. Excellent communication skills across technical ...

Gen AI Architect - London, UK

Hiring Organisation
Capgemini
Location
Greater London, United Kingdom
Employment Type
Full Time
production-grade AI systems using Amazon Bedrock, retrieval-augmented generation (RAG), agentic workflows, and cloud-native AWS services. Drive architecture standards, model orchestration, governance, observability, and operational excellence across the GenAI lifecycle while collaborating with engineering, security, compliance, and business stakeholders Hybrid working: The places that you work from … customization, prompt orchestration, retrieval pipelines, and agentic workflows Design agentic AI systems incorporating tool use, workflow orchestration, memory management, and autonomous decision flows Implement observability for prompts, model responses, vector retrieval quality, and agent execution workflows Integrate GenAI capabilities into enterprise applications, APIs, workflow platforms, and data ecosystems Work with ...

Senior Software Engineer - Python/AWS

Hiring Organisation
Lunio
Location
City of London, London, United Kingdom
break down ambiguity, coordinate across Product/Design/Data to land outcomes. Raise the quality bar: define practical standards for testing, security, and observability, act as approver on critical PRs, model excellent reviews and pairing. Operate and improve production: own service performance targets for your area, lead incident response … simple. Technical Execution & Delivery: Leads execution across multiple stories/engineers, breaks down ambiguous problems, and delivers predictably with sensible trade-offs. Testing, Reliability & Observability: Bakes in testability, defines/uses service performance targets, alerts, logs, and traces, advocates for reliability alongside features. Security & Privacy: Applies secure-by-default patterns ...

Head of Platforms - Technology, Infrastructure and Operations

Hiring Organisation
Jobleads-UK
Location
Greater London, England, United Kingdom
code generation, testing, and automation. Drive adoption of AI‐enabled engineering practices. Ensure secure and efficient‐by‐default platform services through automation. Ensure reliability, observability, and cost efficiency of platform services. Define resilience, incident management, and operational models. Track and report on platform maturity and performance. Partner with Business Unit … developer experience. Demonstrated stakeholder influence across complex organizations. Experience leading distributed engineering teams. Familiarity with AI‐enabled engineering practices. Strong grounding in SRE, observability, and secure‐by‐design. Excellent communication and leadership skills. Success Measures Increased developer productivity and satisfaction. Adoption of platform capabilities across engineering teams. Reduction in toil ...