76 to 100 of 120 Observability Jobs in London

Site Reliability Engineer

Hiring Organisation
VIQU IT
Location
United Kingdom, Whitechapel, Greater London
Employment Type
Permanent
Salary
£40000 - £50000/annum
Engineer to help improve the reliability, scalability and automation of their AWS estate. This is a hands-on engineering role working across cloud infrastructure, observability, CI/CD and platform tooling, helping development teams deliver faster and more reliably. You’ll be joining a collaborative engineering environment with the opportunity … scalable AWS infrastructure. Develop and manage Infrastructure as Code using AWS CDK. Support CI/CD pipelines and deployment automation. Improve monitoring, logging and observability across distributed systems. Support incident management, root cause analysis and platform reliability improvements. Work closely with engineering and architecture teams to improve operational performance ...

Integration Developer FTC

Hiring Organisation
itecopeople
Location
London, United Kingdom
Employment Type
Permanent, Work From Home
Salary
£60,000
Build connectors, event-processing services, and data pipelines Design scalable integration patterns, schemas, and event flows Develop CDC pipelines and resilient messaging solutions Improve observability through logging, metrics, and tracing Deploy containerised services using Docker and Kubernetes Contribute to architecture, code reviews, and engineering standards Collaborate with developers, data engineers … design Agile development experience Strong communication and collaboration skills Desirable Skills Go and/or Python CDC pipeline development Azure cloud experience Observability tooling (Prometheus, Grafana, OpenTelemetry) Experience within regulated environments What's on Offer Hybrid working - 2 days per week in London Salary up to £60,900 Generous pension ...

Platform Engineer Microsoft Fabric Azure

Hiring Organisation
Client Server
Location
City of London, London, United Kingdom
Employment Type
Permanent, Work From Home
integrations Monitor cloud services, troubleshoot incidents and drive platform reliability and performance Develop dashboards and reporting solutions using Power BI and Fabric Improve automation, observability, scalability and operational efficiency across the environment Location: You'll join the team in the London office, with flexibility to work from home once ...

Networking Specialist

Hiring Organisation
Ncounter
Location
London, South East, England, United Kingdom
Employment Type
Full-Time
Salary
£180,000 - £200,000 per annum
essential, alongside confidence working with modern data centre technologies. Nice to Haves: • Experience with automation using Python, Ansible, or similar tools • Exposure to observability and monitoring platforms • Understanding of network security and secure routing design • Hands-on experience with Arista and or Cisco in production environments • Industry certifications such ...

Bid Solution Architect - LONDON - PART TIME

Hiring Organisation
Reed
Location
Southwark, London, England, United Kingdom
Employment Type
Temporary
Salary
Salary negotiable
ready. Assure the full scheduling system architecture, focusing on performance and resilience. Validate integration assumptions, API patterns, data flows, and control mechanisms. Ensure system observability, failover, and peak-load behavior are credible and evidenced. Design or validate security controls across application, infrastructure, and operations. Ensure alignment of IAM, encryption, logging ...

Platform Engineer

Hiring Organisation
UA Consulting
Location
City of London, London, United Kingdom
Employment Type
Contract
Contract Rate
From £300 to £400 per day
Platform Engineer with strong site reliability principles to join our Platform team.Youllfocus onmaintainingand improving production reliability, automating operational tasks, and enhancing our observability stack.Youllwork closely with SREs, support engineers, release managers, and incident managers to ensureour systems meet SLIs, SLOs, and SLA targets. Key Responsibilities Maintain and optimise production environments … production workloads (EKS, EC2, RDS/Aurora, S3, IAM). Infrastructure as Code with Terraform and configuration management with Ansible. Strong experience with observability tools (Grafana, Prometheus, Loki, Tempo). Understanding of SRE concepts (SLIs, SLOs, error budgets, capacity planning). Comfortable working in incident and problem management processes. Strong ...

Platform Engineer

Hiring Organisation
UA Consulting
Location
City of London, London, United Kingdom
Employment Type
Permanent
Salary
£75,000
Platform Engineer with strong site reliability principles to join our Platform team.Youllfocus onmaintainingand improving production reliability, automating operational tasks, and enhancing our observability stack.Youllwork closely with SREs, support engineers, release managers, and incident managers to ensureour systems meet SLIs, SLOs, and SLA targets. Key Responsibilities Maintain and optimise production environments … production workloads (EKS, EC2, RDS/Aurora, S3, IAM). Infrastructure as Code with Terraform and configuration management with Ansible. Strong experience with observability tools (Grafana, Prometheus, Loki, Tempo). Understanding of SRE concepts (SLIs, SLOs, error budgets, capacity planning). Comfortable working in incident and problem management processes. Strong ...

AI Platform/ DevOps Engineer

Hiring Organisation
The Portfolio Group
Location
City of London, London, Castle Baynard, United Kingdom
Employment Type
Permanent
Salary
£70000 - £80000/annum + Benefits
Bedrock Knowledge Bases) and embedding pipelines Build and maintain CI/CD pipelines for inference services, retrievers, ingestion workflows, and RAG components Implement observability across AI workloads using CloudWatch, MLflow, and OpenTelemetry - covering latency, throughput, cost, and system health Apply secure-by-design principles including IAM, encryption, network controls … Terraform experience for infrastructure-as-code, provisioning and managing cloud infrastructure at scale Experience operating containerised services, managing CI/CD pipelines, and owning observability and reliability Familiarity with vector databases or search infrastructure (OpenSearch, Algolia) is a strong advantage Python proficiency for scripting, automation, and deploying production services Solid ...

Go Full Stack Developer

Hiring Organisation
itecopeople
Location
London, United Kingdom
Employment Type
Permanent, Work From Home
Salary
£60,000
event-driven services Contribute to CI/CD pipelines and cloud-native deployments Review code and champion engineering best practices Improve application performance, observability and reliability Collaborate within Agile delivery teams across multiple projects Support technical decision-making and continuous improvement Skills & Experience We are looking for candidates with strong … reviews, testing and engineering governance Experience with any of the following would be highly advantageous: Microsoft Azure Python GitOps tooling (Argo CD/Flux) Observability tooling (Prometheus, Grafana, OpenTelemetry) AI/LLM-enabled applications Event-driven architectures and messaging platforms What's on Offer Opportunity to work on cutting-edge ...

Site Reliability Engineer

Hiring Organisation
Pertemps London
Location
London, United Kingdom
Employment Type
Permanent
Salary
GBP 50,000 Annual
Jenkins, GitLab CI) Develop and maintain Terraform modules for infrastructure-as-code Build automation tools (CLI tools, scripts, GitHub Apps, self-service tooling) Own observability: dashboards, alerts, monitoring, and runbooks Continuously improve platform processes and reduce operational toil What We're Looking For Essential Skills & Experience 2-3 years … GitHub Actions, GitLab CI, Jenkins) Ability to write production-quality code in Python or Bash Solid networking fundamentals (DNS, load balancers, CDNs) Experience with observability tools (NewRelic, Datadog, Prometheus, Grafana) Comfortable participating in on-call rotations Experience using AI tools (e.g. ChatGPT, Copilot, Cursor) to enhance productivity Desirable Go, Ansible ...

AKS DevOps Engineer - Azure Kubernetes

Hiring Organisation
Reed
Location
London Gatwick Airport, Gatwick, West Sussex, England, United Kingdom
Employment Type
Full-Time
Salary
£70,000 per annum, Inc benefits
/CD pipelines using Azure DevOps with YAML. Implement and maintain secure networking patterns and apply cloud security best practices. Create and maintain platform observability using Azure Monitor, Analytics, and Application Insights. Collaborate with engineering teams to ensure service reliability on the platform. Promote best practice in cloud engineering … private endpoints, load balancing, etc. Scripting proficiency in Bash, PowerShell, or Python. Linux operating system knowledge and troubleshooting capability. Experience implementing monitoring, logging, and observability solutions in Azure. Ability to communicate platform issues like risk, platform health, cost etc to non-technical audiences. Desirable Skills: Experience contributing to architecture ...

Platform Engineer

Hiring Organisation
UA Consulting
Location
City, London, United Kingdom
Employment Type
Contract
Contract Rate
GBP 300 - 400 Daily
Platform Engineer with strong site reliability principles to join our Platform team.Youllfocus onmaintainingand improving production reliability, automating operational tasks, and enhancing our observability stack.Youllwork closely with SREs, support engineers, release managers, and incident managers to ensureour systems meet SLIs, SLOs, and SLA targets click apply for full job details ...

Site Reliability Engineer - BACLJP00013172

Hiring Organisation
Huxley Associates
Location
Bromley, London, South Yorkshire, United Kingdom
Employment Type
Contract
Contract Rate
£600/day
Lead role within a banking/payments environment that I thought might be of interest. You'd lead SRE strategy, driving automation, observability, and reliability by design, with a focus on reducing incidents and improving recovery. Looking for someone with 8+ years' experience in SRE, strong resilience engineering background ...

Senior Network Architect, GPU Fabric and AI Infrastructure

Hiring Organisation
We Love Alfa
Location
London, United Kingdom
Employment Type
Permanent
Salary
GBP 180,000 - 240,000 Annual
directly impact customer training workloads. This person will own network architecture across GPU fabric, InfiniBand, RoCE v2, Ethernet leaf spine, edge connectivity, peering, observability, deployment standards and operational handover. We are looking for someone who has: Deep GPU cluster or HPC deployment experience Strong InfiniBand production experience RoCE v2 experience ...

Senior Infrastructure Architect

Hiring Organisation
ALFA TECHNOLOGY RECRUITMENT LTD
Location
City of London, London, United Kingdom
Employment Type
Temporary
directly impact customer training workloads. This person will own network architecture across GPU fabric, InfiniBand, RoCE v2, Ethernet leaf spine, edge connectivity, peering, observability, deployment standards and operational handover. We are looking for someone who has: Deep GPU cluster or HPC deployment experience Strong InfiniBand production experience RoCE v2 experience ...

BDR Language Speaker

Hiring Organisation
Pareto
Location
London, South East, England, United Kingdom
Employment Type
Full-Time
Salary
£30,000 - £35,000 per annum
must speak Filipino fluently to qualify for this role* Our client is a global data platform that helps turn data into action for Observability, IT, Security and more. Leaders in their field, our client is growing at an exciting rate and as such are now looking for new bi-lingual ...

Site Reliability Engineer (Security Cleared)

Hiring Organisation
Profile 29
Location
South East London, London, United Kingdom
Employment Type
Permanent, Work From Home
Salary
£65,000
performant infrastructure that underpins critical public-sector services. Youll combine your background in DevOps, cloud engineering, and automation with a focus on reliability, observability, and scalability. Youll also work with event-driven technologies, identity and access management, and data platforms, ensuring our orchestration solutions are resilient, secure, and future-ready. … using Terraform Build and operate scalable infrastructure in Amazon Web Services (AWS) Design, implement, and maintain robust CI/CD pipelines Improve system reliability, observability, performance, and security Implement monitoring, logging, and alerting solutions Troubleshoot production incidents and perform root cause analysis Collaborate with development teams to improve application resilience ...

Site Reliability Engineer (SRE)

Hiring Organisation
Pertemps Reading
Location
London, United Kingdom
Employment Type
Permanent, Work From Home
Salary
£45,000
platform automation, CI/CD, and developer tooling. This is a hands-on role split between supporting engineers and building scalable infrastructure, automation, and observability solutions. Youll work closely with the Head of Technology and engineering teams to improve reliability, developer experience, and platform performance. What Youll Be Doing Developer … Build reusable Terraform modules and manage infrastructure-as-code standards Develop internal tooling, automation scripts, self-service tooling, and platform improvements Own and improve observability across monitoring, dashboards, alerting, and runbooks Identify opportunities to automate manual processes and improve platform reliability Contribute to scalable, maintainable, and secure infrastructure practices What ...

AI Native Software Engineer

Hiring Organisation
Skilliantech Ltd
Location
London, United Kingdom
Employment Type
Contract
implement AI agents, including: Retrieval (RAG) Orchestration workflows Tool/function invocation Policy-based routing Build evaluation frameworks for accuracy, latency, and reliability Implement observability and monitoring for agent lifecycle AI Platform Integration Integrate with AI providers (e.g., OpenAI, Anthropic, Google Vertex, open-source models) Build abstraction layers to support … production (agents, RAG, orchestration) Proficiency in Python, Java, or similar backend languages Experience with: CI/CD pipelines Infrastructure as code Monitoring and observability tools Hands-on experience with AI platforms (OpenAI, Claude, Vertex AI, or similar) Preferred Experience Experience with agent frameworks (e.g., LangGraph, AutoGen, CrewAI) Experience designing multi ...

Forward Deployed Engineers

Hiring Organisation
Randstad Digital
Location
London, United Kingdom
Employment Type
Contract
Contract Rate
£450 - £500 per day + Inside IR35
implement AI agents, including: Retrieval (RAG) Orchestration workflows Tool/function invocation Policy-based routing Build evaluation frameworks for accuracy, latency, and reliability Implement observability and monitoring for agent lifecycle AI Platform Integration Integrate with AI providers (e.g., OpenAI, Anthropic, Google Vertex, open-source models) Build abstraction layers to support … production (agents, RAG, orchestration) Proficiency in Python, Java, or similar backend languages Experience with: CI/CD pipelines Infrastructure as code Monitoring and observability tools Hands-on experience with AI platforms (OpenAI, Claude, Vertex AI, or similar) Preferred Experience Experience with agent frameworks (e.g., LangGraph, AutoGen, CrewAI) Experience designing multi ...

Azure Site Reliability Engineer (Remote)

Hiring Organisation
Revybe IT Recruitment Ltd
Location
London, South East, England, United Kingdom
Employment Type
Full-Time
Salary
£40,000 - £55,000 per annum
Azure cloud infrastructure and services Infrastructure as Code using Terraform CI/CD pipelines with Azure DevOps and/or GitHub Actions Improving observability, monitoring, and incident response processes Driving automation to enhance system reliability and scalability Supporting production systems and participating in incident management What we’re looking … Azure Central London (Hybrid - a couple of days per month in office) £45,000 - £55,000 + Benefits Azure, Terraform, Azure DevOps, Github Actions, Observability, Monitoring ...

Principal Full Stack Engineer & Architecture Lead

Hiring Organisation
Command Recruitment
Location
London, United Kingdom
Employment Type
Permanent
Salary
£80000 - £90000/annum
technical design decisions Define scalable, secure, and maintainable engineering standards Provide technical leadership across frontend, backend, APIs, infrastructure, and integrations Drive platform scalability, resilience, observability, and performance Partner with leadership teams to align technical strategy with business goals Act as the senior technical authority for complex engineering decisions Hands … Gateway, EventBridge, SQS, Step Functions, S3, CloudWatch, RDS) Backend Node.js, TypeScript Frontend React, Next.js, Tailwind CSS Data & Architecture PostgreSQL, Serverless, Event-Driven Microservices DevOps & Observability Terraform/AWS CDK, CI/CD, Monitoring & Logging About You We are looking for a technically strong and commercially minded engineering leader with: 10+ ...

AI Engineer

Hiring Organisation
VIA MATCH LIMITED
Location
London, South East, England, United Kingdom
Employment Type
Full-Time
Salary
£80,000 - £110,000 per annum
Doing Designing and building production-grade AI systems that integrate LLMs, RAG pipelines, vector databases, and agentic frameworks Creating evaluation and observability frameworks to measure, monitor, and continuously improve system performance, accuracy, and reliability Implementing and maintaining retrieval systems, including ingestion pipelines, chunking strategies, and advanced techniques such as HyDE … with hands-on fine-tuning experience Familiarity with real-time streaming, multimodal models, or search technologies such as Elasticsearch Experience with model observability tools such as LangSmith or Weights & Biases Background in a regulated or specialised vertical (financial services, healthcare, energy, legal, retail), with an understanding of compliance, security ...

Data Reliability Engineer

Hiring Organisation
Ashdown Group
Location
City of London, London, United Kingdom
Employment Type
Permanent, Work From Home
Salary
£95,000
work from home 2 days per week. This is a high-impact role focused on improving data quality, reducing incidents, and building scalable observability across a modern enterprise data platform. Youll help ensure data across the organisation is accurate, reliable, and trusted for critical business decision-making. Youll take ownership … style roles, with strong SQL and Python skills and experience working in modern cloud-based data environments. Hands-on experience with data observability tools such as Grafana, Monte Carlo, or Acceldata, and data governance/quality platforms like Informatica, Collibra or Microsoft Purview is highly desirable. Experience within the Azure ...

Director - Principal Engineer (Java/Angular/AI)

Hiring Organisation
Robert Walters
Location
London, South East, England, United Kingdom
Employment Type
Full-Time
Salary
£140,000 - £170,000 per annum
volumes of financial and transactional data Contribute directly to architecture, system design, and hands-on software development Drive engineering best practices across automation, testing, observability, and performance Build resilient, production-grade systems with a strong focus on reliability and scalability Work across the full software development lifecycle from design through … scalability, and high-availability systems Experience building automated, production-grade platforms with minimal manual intervention Familiarity with cloud-native technologies, CI/CD, and observability tooling Strong engineering mindset with a hands-on approach to development Interest in modern engineering tooling, including AI-assisted development workflows Robert Walters Operations Limited ...