201 to 225 of 257 Observability Jobs in London

Senior Infrastructure Architect

Hiring Organisation
ALFA TECHNOLOGY RECRUITMENT LTD
Location
City of London, London, United Kingdom
Employment Type
Temporary
directly impact customer training workloads. This person will own network architecture across GPU fabric, InfiniBand, RoCE v2, Ethernet leaf spine, edge connectivity, peering, observability, deployment standards and operational handover. We are looking for someone who has: Deep GPU cluster or HPC deployment experience Strong InfiniBand production experience RoCE v2 experience ...

Back End Developer

Hiring Organisation
NearTech Search
Location
London Area, United Kingdom
backend initiatives end-to-end, from architecture to rollout • Strengthen testing strategy across unit and integration layers • Improve data and integration workflows with observability and resilience • Optimise Postgres (RDS) and MongoDB performance, modelling and migrations The role requires... • Strong commercial experience with Node.js and TypeScript • Deep API design expertise, including ...

BDR Language Speaker

Hiring Organisation
Pareto
Location
London, South East, England, United Kingdom
Employment Type
Full-Time
Salary
£30,000 - £35,000 per annum
must speak Filipino fluently to qualify for this role* Our client is a global data platform that helps turn data into action for Observability, IT, Security and more. Leaders in their field, our client is growing at an exciting rate and as such are now looking for new bi-lingual ...

Senior Frontend Developer

Hiring Organisation
SEEKR
Location
City of London, London, United Kingdom
bridges so builders can wire their products into hundreds of third‐party tools without hand‐rolling every integration. It handles managed auth, real‐time observability and connector sprawl so product teams can focus on great agent experiences instead of glue code. Your job is to make the surface they ...

Senior Software Engineer

Hiring Organisation
Harrington Starr
Location
City of London, London, United Kingdom
business-critical trading platform. The role combines software engineering with reliability engineering. You’ll be involved in designing and building internal tooling, improving observability, automating operations, supporting development teams, and helping ensure trading systems remain stable, scalable, and high performing. It would suit someone who enjoys solving technical problems … speed, resilience, and continuous improvement matter. What you will do Build tools, automation, and internal services that improve platform reliability Implement monitoring, telemetry, and observability standards across distributed systems Analyse performance across application, OS, and network layers to identify bottlenecks Help define and improve SLOs/SLAs for critical services ...

Site Reliability Engineer (Security Cleared)

Hiring Organisation
Profile 29
Location
South East London, London, United Kingdom
Employment Type
Permanent, Work From Home
Salary
£65,000
performant infrastructure that underpins critical public-sector services. Youll combine your background in DevOps, cloud engineering, and automation with a focus on reliability, observability, and scalability. Youll also work with event-driven technologies, identity and access management, and data platforms, ensuring our orchestration solutions are resilient, secure, and future-ready. … using Terraform Build and operate scalable infrastructure in Amazon Web Services (AWS) Design, implement, and maintain robust CI/CD pipelines Improve system reliability, observability, performance, and security Implement monitoring, logging, and alerting solutions Troubleshoot production incidents and perform root cause analysis Collaborate with development teams to improve application resilience ...

Engineering Manager (.NET) - Contract

Hiring Organisation
La Fosse
Location
City of London, London, United Kingdom
resource/capacity management and delivery ownership. - Experience writing executive updates and technical summaries for senior stakeholders. - Strong knowledge of CI/CD, automation, observability, and DevOps maturity models. - Evidence of driving adoption of new tools, frameworks, or processes across multiple teams. Technical Skills & Tools - Languages & Frameworks: C#/.NET … Framework and Core), React - Platforms & Infrastructure: Azure, AKS, Docker, on-prem Windows Server, SQL Server. - IAM and App Gateways: Okta, APIM, Apigee - Monitoring & Observability: Dynatrace, Application Insights - CI/CD & DevOps: Azure DevOps pipelines, SonarCloud, Github - Architecture & Patterns: Microservices, event-driven architecture, domain-driven design, modern scalable design principles ...

Site Reliability Engineer (SRE)

Hiring Organisation
Reading Industrial Pertemps
Location
London, South East, England, United Kingdom
Employment Type
Full-Time
Salary
£50,000 per annum
platform automation, CI/CD, and developer tooling.This is a hands-on role split between supporting engineers and building scalable infrastructure, automation, and observability solutions. You’ll work closely with the Head of Technology and engineering teams to improve reliability, developer experience, and platform performance. What You’ll Be Doing … Build reusable Terraform modules and manage infrastructure-as-code standards Develop internal tooling, automation scripts, self-service tooling, and platform improvements Own and improve observability across monitoring, dashboards, alerting, and runbooks Identify opportunities to automate manual processes and improve platform reliability Contribute to scalable, maintainable, and secure infrastructure practices What ...

Artificial Intelligence Engineer

Hiring Organisation
Soho Square Solutions
Location
London Area, United Kingdom
implement AI agents, including: ◦ Retrieval (RAG) ◦ Orchestration workflows ◦ Tool/function invocation ◦ Policy-based routing • Build evaluation frameworks for accuracy, latency, and reliability • Implement observability and monitoring for agent lifecycle AI Platform Integration • Integrate with AI providers (e.g., OpenAI, Anthropic, Google Vertex, open-source models) • Build abstraction layers to support … production (agents, RAG, orchestration) • Proficiency in Python, Java, or similar backend languages • Experience with: ◦ CI/CD pipelines ◦ Infrastructure as code ◦ Monitoring and observability tools • Hands-on experience with AI platforms (OpenAI, Claude, Vertex AI, or similar) Preferred Experience • Experience with agent frameworks (e.g., LangGraph, AutoGen, CrewAI) • Experience designing multi ...

Forward Deployed Engineers

Hiring Organisation
Randstad Technologies
Location
London, UK
Employment Type
Full-time
implement AI agents, including: Retrieval (RAG) Orchestration workflows Tool/function invocation Policy-based routing Build evaluation frameworks for accuracy, latency, and reliability Implement observability and monitoring for agent lifecycle AI Platform Integration Integrate with AI providers (e.g., OpenAI, Anthropic, Google Vertex, open-source models) Build abstraction layers to support … production (agents, RAG, orchestration) Proficiency in Python, Java, or similar backend languages Experience with: CI/CD pipelines Infrastructure as code Monitoring and observability tools Hands-on experience with AI platforms (OpenAI, Claude, Vertex AI, or similar) Preferred Experience Experience with agent frameworks (e.g., LangGraph, AutoGen, CrewAI) Experience designing multi ...

Forward Deployed Engineers

Hiring Organisation
Randstad Digital
Location
London, United Kingdom
Employment Type
Contract
Contract Rate
£450 - £500 per day + Inside IR35
implement AI agents, including: Retrieval (RAG) Orchestration workflows Tool/function invocation Policy-based routing Build evaluation frameworks for accuracy, latency, and reliability Implement observability and monitoring for agent lifecycle AI Platform Integration Integrate with AI providers (e.g., OpenAI, Anthropic, Google Vertex, open-source models) Build abstraction layers to support … production (agents, RAG, orchestration) Proficiency in Python, Java, or similar backend languages Experience with: CI/CD pipelines Infrastructure as code Monitoring and observability tools Hands-on experience with AI platforms (OpenAI, Claude, Vertex AI, or similar) Preferred Experience Experience with agent frameworks (e.g., LangGraph, AutoGen, CrewAI) Experience designing multi ...

Platform Engineer

Hiring Organisation
Albert Bow
Location
City of London, London, United Kingdom
preparation, turning compliance into a competitive advantage Build and maintain robust CI/CD pipelines across backend, frontend, and data services Establish company-wide observability — logging, metrics, tracing, alerting, and on-call culture Take ownership of cloud cost management, optimising spend without compromising performance Champion operational excellence across the engineering … What You'll Bring Technical Cloud & IaC: Azure (AWS a bonus), Terraform, AKS/Kubernetes, Docker, GitHub Actions Observability: Hands-on experience with logging, metrics, and distributed tracing frameworks Security: Secrets management, security scanning, and infrastructure hardening best practices Networking: VPCs, DNS, load balancers, VPNs, firewalls — you know your ...

Performance and Monitoring Engineer

Hiring Organisation
Solus Accident Repair Centres
Location
North London, UK
Employment Type
Full-time
talented Performance and Monitoring Engineer to help us strengthen the stability, reliability and performance of our systems. If you're passionate about monitoring, observability and using data to proactively improve service health, this is a great opportunity to make a real impact across a large, ... LFWQ1_UKTJ ...

DevOps Engineer ( Azure )

Hiring Organisation
Experis
Location
Wembley, England, United Kingdom
Responsibilities Observability & Monitoring Platform Design, implement, and own an Azure observability playbook, delivering comprehensive dashboards, alerting rules, and operational runbooks using Application Insights, Log Analytics, and Kusto Query Language (KQL). AIOps & Intelligent Automation Develop AI‐driven alerting and detection mechanisms to surface early‐warning signals, including IP reputation degradation … scale. Infrastructure as Code Expertise Deep proficiency in Terraform, including module design, remote state management, workspace strategies, and multi‐environment deployment patterns. Monitoring & Observability Expertise Advanced experience with KQL for Azure Log Analytics, with the ability to design and build custom Azure Monitor Workbooks for operational insight and reporting. Security ...

AI Deployment & Platform Engineer

Hiring Organisation
LEC AI
Location
London, England, United Kingdom
engineering team to deploy AI systems into live environments, manage runtime infrastructure, scale orchestration systems, optimise inference performance, and build the deployment pipelines and observability that keep everything running. This is a deeply hands-on engineering role for someone who enjoys building production infrastructure, solving operational problems, and making … inference infrastructure and deployment automation • Design scalable runtime environments for multi-agent systems Reliability and Scaling • Monitor system performance, latency, throughput, and uptime • Build observability, logging, and alerting systems • Manage autoscaling and infrastructure optimisation • Debug production failures and runtime bottlenecks Infrastructure Operations • Monitor model drift, data drift, and runtime quality ...

Head of Infrastructure

Hiring Organisation
Jobleads-UK
Location
Greater London, England, United Kingdom
platform and infrastructure strategy Design and evolve cloud architecture to support scale, resilience, and performance Set standards for infrastructure, CI/CD, environments, and observability Make architectural decisions and trade‐offs Developer Experience (DevEx) Provide infrastructure for the development team to code, test and deploy efficiently Advise during design sessions … growing company Ability to operate production systems under pressure Deep hands‐on experience with the AWS cloud platform Strong background in reliability, observability, and incident management Experience leading or mentoring engineers What we offer in return 💰 Competitive salary depending on experience 🏝️ 27 days of annual leave (including 3 days Christmas ...

Senior Site Reliability Engineer

Hiring Organisation
Realm
Location
City of London, London, United Kingdom
High-growth infrastructure company focused on delivering large-scale compute, data centre capacity, and power solutions for advanced machine learning workloads. Platforms support leading research and industry teams requiring high-performance computing at significant scale. ...

SRE Observability Engineer

Hiring Organisation
Access Computer Consulting
Location
City of London, London, United Kingdom
Employment Type
Contract
Contract Rate
£350 - £450/day
recruiting for an SRE Observability Engineer to work in London 2-3 days a week, remaining time remote. The role falls inside IR35 so you will be required to work through an umbrella company for the duration of the contract. This is a 6 month contract which will transfer … permanent role after the initial contract term. You will be responsible for collaborating across various organisations within the client to understand and develop observability solutions for enterprise-wide deployment at scale. You will also manage the legacy monitoring stack across the Production Management organisation within the client. You must have ...

Senior Software Engineer - Up to £100k - Hybrid working

Hiring Organisation
Creo Recruitment
Location
City of London, London, United Kingdom
delivery of complex, scalable systems across multiple services Making pragmatic architectural decisions in a fast-moving environment Driving engineering excellence across testing, security, and observability Owning services in production – improving reliability, performance, and resilience Mentoring engineers and elevating team capability Collaborating cross-functionally with Product, Data, and Design What they … Strong experience designing scalable, maintainable systems with clear ownership boundaries Proven ability to lead delivery across ambiguous, complex problem spaces Deep understanding of reliability, observability, and production systems A security-first mindset with experience mitigating risks and vulnerabilities Excellent communication skills with the ability to influence and mentor Python Experience ...

Senior Platform Engineer (Fully Remote) - GKE, GCP, Terraform

Hiring Organisation
Sanderson Recruitment
Location
London, United Kingdom
Employment Type
Permanent, Work From Home
manage workloads using Helm with strong isolation and configuration practices Own and improve CI/CD pipelines using Azure DevOps and GitOps Embed observability across the platform (monitoring, logging, alerting, tracing) Define and enforce platform standards, patterns and best practices Produce and maintain high-quality documentation, diagrams and runbooks Lead … expertise, particularly Azure DevOps Git-based workflows, GitOps and tools such as Argo CD Experience with service mesh technologies (e.g. Istio) Exposure to observability/APM tooling Confident technical leader with experience setting standards and mentoring others Comfortable working in shared platform environments Reasonable Adjustments: Respect and equality are core ...

Site Reliability Engineer

Hiring Organisation
EQUALS
Location
Greater London, England, United Kingdom
recommendation engine that matches people by musical taste. THE ROLE We're looking for a Site Reliability Engineer to own the infrastructure, observability, and operational health of the Equals platform. You'll be the person who monitors systems needs and health to provide a seamless user experience while providing traceability … 1B+ rows) - Manage Cloudflare (WAF, bot management, DNS, firewall rules) - Make cost-conscious infrastructure decisions - right-sizing instances, storage tiering, optimizing spend Monitoring & Observability - Own the Datadog APM setup: tracing, alerting, dashboards, log management - Maintain and tune alert channels integrated with Slack - Reduce alert fatigue by tuning thresholds, suppressing false ...

Principal Software Engineer

Hiring Organisation
BBC
Location
Greater London, United Kingdom
Employment Type
Full Time
Salary
65000 to 80000 GBP Annually
large-scale data ingestion platforms. Experience working across a broad range of technologies, platforms, and engineering domains within multi-team environments. Familiarity with observability, operational monitoring, CI/CD, and platform reliability practices. Experience with data technologies such as Airflow, Redshift, DynamoDB, MongoDB, or similar tooling. Interest in contributing … EventBridge SQL and NoSQL databases including Postgres, MongoDB, DynamoDB, and Timestream CI/CD and automation tooling including GitHub Actions, Jenkins, and CodePipeline Observability and visualisation tooling including Grafana and Tableau Our wider engineering ecosystem also includes web and mobile technologies, including TypeScript/JavaScript, Swift, and Kotlin, alongside ...

Senior SRE Lead

Hiring Organisation
Albany Beck
Location
London Area, United Kingdom
about capability build, technical excellence, and delivering meaningful change within complex enterprise environments. Role Overview Albany Beck is seeking a Senior SRE Lead/Observability SME to lead the establishment of a new enterprise Site Reliability Engineering (SRE) capability, with a primary focus on designing and implementing a modern observability … suite and operational resilience framework. This is a foundational build role, responsible for defining how reliability engineering and observability are structured, measured, and embedded across a complex global technology estate. The successful candidate will play a key role in shifting the organisation from reactive operational support to a metrics-driven ...

Data Reliability Engineer

Hiring Organisation
Ashdown Group
Location
City of London, London, United Kingdom
Employment Type
Permanent, Work From Home
work from home 2 days per week. This is a high-impact role focused on improving data quality, reducing incidents, and building scalable observability across a modern enterprise data platform. Youll help ensure data across the organisation is accurate, reliable, and trusted for critical business decision-making. Youll take ownership … style roles, with strong SQL and Python skills and experience working in modern cloud-based data environments. Hands-on experience with data observability tools such as Grafana, Monte Carlo, or Acceldata, and data governance/quality platforms like Informatica, Collibra or Microsoft Purview is highly desirable. Experience within the Azure ...

Forward Deployed Engineer

Hiring Organisation
Novatus
Location
London Area, United Kingdom
Novatus Global is a Series B scale-up RegTech SaaS provider and boutique advisory firm, helping financial institutions manage their most complex regulatory requirements. We combine deep consulting expertise with cutting-edge SaaS solutions, enabling ...