76 to 92 of 92 Observability Jobs in Central London

Full Stack Engineer

Hiring Organisation
develop
Location
City of London, London, United Kingdom
customer-facing web platform built on a Next.js stack. The organisation is investing heavily in platform quality, developer experience, CI/CD, testing, and observability to support long-term scalability. You will contribute to strengthening the platform that enables multiple product squads to deliver features reliably and release with confidence. … capabilities, and supporting reliable releases. You will collaborate closely with senior engineers while taking ownership of well-defined areas, helping improve testing, CI pipelines, observability, and overall developer workflows. The role suits an engineer who enjoys solving practical platform problems, building scalable web applications, and continuously improving how teams deliver ...

Senior Site Reliability Engineer

Hiring Organisation
Realm
Location
City of London, London, United Kingdom
High-growth infrastructure company focused on delivering large-scale compute, data centre capacity, and power solutions for advanced machine learning workloads. Platforms support leading research and industry teams requiring high-performance computing at significant scale. ...

SRE Observability Engineer

Hiring Organisation
Access Computer Consulting
Location
City of London, London, United Kingdom
Employment Type
Contract
Contract Rate
£350 - £450/day
recruiting for an SRE Observability Engineer to work in London 2-3 days a week, remaining time remote. The role falls inside IR35 so you will be required to work through an umbrella company for the duration of the contract. This is a 6 month contract which will transfer … permanent role after the initial contract term. You will be responsible for collaborating across various organisations within the client to understand and develop observability solutions for enterprise-wide deployment at scale. You will also manage the legacy monitoring stack across the Production Management organisation within the client. You must have ...

DevOps Engineer

Hiring Organisation
Autonomai Recruitment
Location
City of London, London, United Kingdom
performance and resilience. Build and extend network automation workflows to configure and manage trading infrastructure (routers, switches, security, and connectivity). Define and implement observability for services and infrastructure using metrics, logging, and alerting (e.g., Prometheus, Grafana, and related tooling). Key requirements Strong backend development experience with Python , including … experience building APIs (e.g., FastAPI or similar frameworks). Experience with Prometheus ‐style observability: metrics, alerting, and dashboards; familiarity with Grafana is a plus. Hands‐on experience with ClickHouse or similar high‐performance data stores is a strong advantage. Practical experience with network automation ; Ansible or similar configuration‐management tools ...

Data Reliability Engineer

Hiring Organisation
Ashdown Group
Location
City of London, London, United Kingdom
Employment Type
Permanent, Work From Home
work from home 2 days per week. This is a high-impact role focused on improving data quality, reducing incidents, and building scalable observability across a modern enterprise data platform. Youll help ensure data across the organisation is accurate, reliable, and trusted for critical business decision-making. Youll take ownership … style roles, with strong SQL and Python skills and experience working in modern cloud-based data environments. Hands-on experience with data observability tools such as Grafana, Monte Carlo, or Acceldata, and data governance/quality platforms like Informatica, Collibra or Microsoft Purview is highly desirable. Experience within the Azure ...

DevOps Engineer

Hiring Organisation
Few&Far
Location
City of London, London, United Kingdom
hands on DevOps/Infrastructure Engineer who thrives in early-stage environments and loves building from the ground up. You’ll own reliability, observability, incident response, and infrastructure automation across a modern AI-native platform. 🔥 Tech stack includes: *GCP *Terraform *Cloud Run/Kubernetes *GitHub Actions *Python & Kotlin *Temporal … particularly keen to speak with engineers who have: ✅ Strong Terraform & production infrastructure experience ✅ Deep observability & monitoring expertise ✅ Incident management/on-call experience ✅ Security-first mindset ✅ CI/CD pipeline expertise ✅ Startup or greenfield experience This is a brilliant opportunity to shape what “good” looks like in a fast-moving ...

Senior Software Engineer – AI / Agentic Systems

Hiring Organisation
MA (Montreal Associates)
Location
City of London, London, United Kingdom
grade AI platform. You’ll operate at the core of the product engineering function—designing systems that power autonomous agents, orchestrate workflows, and enable observability at scale. This is not just another backend role. You’ll influence architecture, mentor engineers, and help define the technical direction of a rapidly growing … Lead design and code reviews , ensuring high standards of quality and security Collaborate closely with AI research, product, and infrastructure teams Improve system reliability, observability, and scalability Mentor engineers and act as a technical multiplier across teams Champion best practices, tooling, and engineering excellence Proactively identify and resolve technical debt ...

Platform Engineer: £120k + Bonus/benefits (AI Trading)

Hiring Organisation
Hunter Bond
Location
City of London, London, United Kingdom
global trading platform. The successful candidate will be involved in every layer of the technology stack—from hardware and operating systems to automation and observability—while gaining exposure to how a world-class investment firm manages its technology infrastructure. Key Responsibilities Manage a distributed compute environment and several petabyte-scale … agile methodologies) Familiarity with infrastructure automation and configuration management tools (Chef, Puppet, or Ansible) Exposure to distributed storage systems and related protocols Experience with observability and monitoring tools (Elasticsearch, Logstash, Kibana, Datadog, Prometheus, Grafana) Strong written and verbal communication skills Demonstrated ability to learn quickly and adapt to evolving technologies ...

Lead Software Engineer

Hiring Organisation
5V Video
Location
City of London, London, United Kingdom
+ AWS (Lambda, API Gateway, S3, DynamoDB) Handling event-driven architectures (Kafka, SNS/SQS, etc.) Driving system design decisions across distributed systems Improving observability, reliability, and performance in production Debugging complex issues and leading resolution across teams Staying hands-on while setting technical direction and standards Tech Stack Python … Lambda, API Gateway, S3, DynamoDB, IAM) Event-driven systems (Kafka, SNS/SQS) CI/CD (Concourse, Git workflows) Databases (Postgres, DynamoDB, Couchbase) Observability (Prometheus, Grafana, CloudWatch) What You’ll Bring Strong backend engineering experience (Python preferred) Proven experience building distributed systems at scale Deep understanding of microservices + event ...

Founding Engineer

Hiring Organisation
Omnam Investment Group
Location
City of London, London, United Kingdom
environments Lead integrations with external systems and support early data onboarding Establish engineering standards, tooling, documentation, and technical processes from the start Set up observability, monitoring, and performance systems Jump in wherever needed, from quick scripts and data cleaning to debugging production issues What You Bring 5+ years of engineering … with backend frameworks (FastAPI, Django, Node.js, Rails, etc.) Strong SQL, data modeling, and database design knowledge Familiarity with IaC, containers, CI/CD, and observability tools Bonus : experience in ETL, or hospitality/Proptech/real-estate technology Why Join Us We work together in the heart of London ...

Principal AI Engineer – London (Hybrid) | Python, LLMs, AI Strategy

Hiring Organisation
Oliver Bernard
Location
City of London, London, United Kingdom
powered applications using LLMs and agent frameworks Drive adoption of tools such as Autogen, LangGraph, and modern AI orchestration frameworks Oversee AI governance, observability, scalability, and best engineering practices Partner with senior client stakeholders to align AI solutions with business goals Mentor Lead AI Engineers and help shape engineering standards …/LLM systems Strong Python engineering background Expertise with Azure or AWS AI ecosystems Experience with MLOps, Kubernetes, Docker, vector databases, and AI observability tooling Strong understanding of agentic AI frameworks and production AI infrastructure Comfortable operating in client-facing and strategic consulting environments 📍 London hybrid working (Tue–Thu onsite ...

Site Reliability Engineer (Bare Metal Infrastructure)

Hiring Organisation
Hunter Bond
Location
City of London, London, United Kingdom
multi-petabyte infrastructure Writing Python to automate anything manual or repetitive Working closely with engineers across the business to improve reliability and performance Enhancing observability, monitoring and system transparency Driving automation across config management and container environments What they’re looking for (or a willingness to learn the below … years’ experience working deeply with Linux Familiarity with monitoring/observability tooling (ELK, OpenTelemetry, VictoriaMetrics) Strong Python skills (automation/scripting) Experience with CI/CD tooling is a plus Exposure to Docker and container ecosystems is a plus Experience with Ansible, Chef, Puppet or similar Experience working with large ...

SRE Consultant

Hiring Organisation
Akkodis
Location
City of London, London, United Kingdom
Employment Type
Permanent
Salary
£90000 - £100000/annum
include: Define and embed SRE engagement models aligned to modern engineering and traditional ITSM/ITIL practices Establish SLIs, SLOs, and Error Budgets Shape observability strategies using metrics, logs, and traces Design incident response models and post-incident learning loops Reduce toil through automation and engineering excellence Deliver SRE capability … Looking For Extensive experience in SRE, cloud operations, or DevOps Proven consulting or advisory background Experience with AWS, Azure, or GCP Strong observability and incident management expertise Ability to obtain UK SC clearance Modis International Ltd acts as an employment agency for permanent recruitment and an employment business ...

AI Architect

Hiring Organisation
Tata Consultancy Services
Location
City of London, London, United Kingdom
operated safely over time. Key responsibilities: Architect and govern multi-agent and agent-swarm systems at enterprise scale. Define agent safety, governance, observability, and testing standards. Establish AI guardrails, frameworks, governance models, and safety controls. Design human-in-the-loop optimisation to balance autonomy, reliability, and performance. Own patterns … native and agent-based design principles. Design and govern enterprise-scale distributed systems with embedded AI capabilities. Architect and evolve agent orchestration platforms. Own observability, reliability, security, scalability, performance, and cost management (FinOps). Ensure platforms are production-ready, secure, auditable, and compliant. Partner with CTOs and senior leadership ...

CAF Ecosystem and Operations Manager (Not Specified)

Hiring Organisation
IAG Transform
Location
Kensington, Merseyside, UK
Employment Type
Full-time
model strategy) o Orchestration frameworks and event-driven architectures o Document AI and extraction (e.g., Berdock)o EVAL frameworks and LLM evaluation methodologies o Observability, monitoring and guardrails for AI agents Ensure scalable, secure, compliant and production-grade solutions. Champion best practices in AI governance, risk management, and model lifecycle … native architectures (preferably AWS) Agentic AI frameworks (LangChain, LangGraph, agent orchestration) Multi-LLM strategies and model selection EVAL frameworks for LLM/agent performance Observability, logging, guardrails and governance for AI agents Workflow orchestration and integration patterns Enterprise-grade security and compliance considerations for AI systems Desirable Experience within aviation ...

Dynatrace SME

Hiring Organisation
Adecco
Location
City of London, London, United Kingdom
Employment Type
Contract
Contract Rate
£560 - £800/day
Group's business-critical applications. We are seeking a skilled Dynatrace Admin/Consultant to play a key role in the enablement of observability across complex, hybrid cloud environments. The ideal candidate will have deep expertise in Dynatrace implementation (SaaS and On-Premises), monitoring configuration, and AI-driven insights … identify opportunities for enhancement to monitoring configuration and capabilities across critical applications. * Participate in the review of roles and responsibilities between teams for observability and make recommendations for improvement of the standards with an emphasis on Operational Resilience. * Play a key part in providing an automatically maintained ...

AI Software Engineer (London)

Hiring Organisation
Shop Circle
Location
City of London, London, United Kingdom
pipelines, and agent-based systems in production environments Implement guardrails (validation, error handling, fallbacks, human-in-the-loop flows) Set up monitoring and observability to continuously improve system performance Evaluate and optimize systems for accuracy, latency, and cost Improve prompts, retrieval strategies, and model behavior through structured experimentation Collaborate cross … language across the business Nice to Have Experience with evaluation frameworks and benchmarking Familiarity with vector databases (e.g. Pinecone, Chroma) Experience with monitoring and observability tools Frontend experience (JavaScript/TypeScript) Experience optimizing systems for latency and cost Exposure to business metrics and ROI-driven decision making Why Shop Circle ...