76 to 82 of 82 Observability Jobs in the City of London

Network Engineer

Hiring Organisation
Autonomai Recruitment
Location
City of London, London, United Kingdom
hyperscale infrastructure). They have experience building networks from 0→1 and are comfortable operating across everything from bare‐metal Linux to modern build, observability, and automation stacks. This role sits at the intersection of advanced networking, large‐scale ML/AI platforms, and high‐end automation. Network Engineer – Overview … distributed networks with expert knowledge of routing, switching, and multicast concepts, ensuring predictable performance under extreme load. Build and evolve network monitoring, alerting, and observability, integrating telemetry into operational and analytical data stores to support ML/AI and systematic strategies. Manage and optimise Kubernetes cluster networking and container orchestration ...

AI/MLOps Platform Engineer

Hiring Organisation
Barclays Bank PLC
Location
City, London, United Kingdom
Employment Type
Permanent
Salary
GBP Annual
Join Us in Shaping the Future of AI at Barclays. We're launching an exciting new initiative at Barclays to design, build, and scale next-generation platform components that empower developers - including Quants and Strats ...

Agentic Developer - Building guardrails for autonomous AI

Hiring Organisation
governr
Location
City of London, London, United Kingdom
level proficiency in Python, Rust, or Go (you write systems that can't fail) • Deep understanding of distributed systems, real-time data processing, and observability architectures • Production ML/AI experience : You've deployed models, debugged their failures, and built monitoring around them • System design mastery : You can architect … autonomous decision-making, goal-directed behaviour, tool use, memory systems • Familiarity with AI safety concepts : alignment, interpretability, robustness, adversarial examples • Experience with monitoring/observability : instrumentation, logging, tracing, alerting in complex systems Working Style: • You ship to production regularly and own what you deploy • You write documentation that others ...

Staff Backend Engineer (Python | AI Lab | £170,000)

Hiring Organisation
Paradigm Talent
Location
City of London, London, United Kingdom
Role: Staff Software Engineer (Python | Backend | Infrastructure) Location: Hybrid - 2-3 days in London Office Compensation: Up to £170,000 + equity We’re working with a frontier AI lab pushing the boundaries of computational ...

Site Reliability Program Manager

Hiring Organisation
HCLTech
Location
City of London, London, United Kingdom
experience, ideally managing complex cross-functional or globally distributed teams. Must have hands-on experience with packet captures analysis through tools like Observer and Observability development using Splunk, ELF & Grafana. Must have domain experience in Payment Card real time transaction processing, clearing & settlement, dispute and fraud management. Must work from … office minimum 4 days a week and be flexible for 5 days if necessary. Experience with PaaS/SaaS, cloud environments, distributed systems, observability tooling, on-call/incident management tools. Data-driven mindset: comfortable analysing metrics, generating reports, and driving improvements based on data. Familiarity with SRE principles — high ...

Engineering Manager (Python) - AI/ML SaaS Platform

Hiring Organisation
Creo Recruitment
Location
City of London, London, United Kingdom
performance. Run technical design reviews , guide architecture decisions, and support engineers in navigating trade-offs around performance, cost, and reliability. Champion operational excellence — strong observability, testing discipline, incident response, and SLO ownership. Collaborate with Product & Design to define technical requirements, prioritise roadmaps, and drive measurable outcomes. Tech Environment … quality software and scalable data pipelines with predictable velocity. Clear improvements in system reliability, throughput, and cost efficiency. Strong engineering discipline across design, testing, observability, and incident management. Improved technical foundations and reduced operational toil. Clear, thoughtful communication and alignment across engineering, product, and design. ...

Director of Artificial Intelligence

Hiring Organisation
Omnis Partners
Location
City of London, London, United Kingdom
multi-agent systems from scratch using frameworks such as ReAct, CoT loops, LangGraph, and MCP. Build and productionise agentic AI solutions with strong evaluation, observability, and orchestration. Scale deployments across diverse environments (SQL-based workflows, pandas, RAG pipelines, distributed compute). Define evaluation standards : Pass@N, multi-run testing, retriever … engineering systems in production. Strong track record of leading complex technical delivery while remaining hands-on. Solid software engineering foundations : deployment, observability, monitoring, memory orchestration, optimisation. Familiarity with agentic architectures (ReAct, CoT loops, LangGraph, tool orchestration). Excellent communication skills with the ability to run workshops and shape technical direction. ...