176 to 200 of 257 Observability Jobs in London

Founding Engineer

Hiring Organisation
Pharosyn
Location
London, England, United Kingdom
system: •⁠ ⁠The development environment, tooling, and workflows that let a small team move extremely fast without breaking production •⁠ ⁠Tight feedback loops between customer usage, observability, evaluation, and execution (tests, CI, code review) •⁠ ⁠Early architectural choices that scale to a $100m+ product with a lean team •⁠ ⁠Treating developer productivity, reliability ...

Software Architect

Hiring Organisation
BBC
Location
Greater London, United Kingdom
Employment Type
Full Time
Salary
72000 to 80000 GBP Annually
offs (e.g. coupling, cohesion, consistency, scalability) Familiarity with evolutionary architecture practices (e.g. fitness functions, incremental change) Experience with modern engineering practices (CI/CD, observability, cloud-native systems) Proven ability to influence without authority across multiple teams Strong facilitation skills - able to guide discussions and surface trade-offs Comfortable operating ...

Site Reliability Engineer

Hiring Organisation
Computappoint
Location
City Of London, England, United Kingdom
where reliability genuinely isn't optional. The role blends application support, platform engineering and SRE practice. It suits someone who leans toward automation and observability over reactive firefighting. Responsibilities: Managing OpenShift and Kubernetes clusters across physical, virtual, and containerised environments Operating observability stacks ( Grafana , Prometheus, Splunk) and driving proactive monitoring … call rotation Key Requirements: Hands-on Kubernetes and/or OpenShift experience in production Scripting skills in Python , Bash, or PowerShell Familiarity with observability tooling and SRE principles SQL and database knowledge (MySQL, Oracle, or similar) Experience supporting .NET, Java, or microservices applications It would be great ...

Site Reliability Engineer

Hiring Organisation
VIQU IT Recruitment
Location
East London, London, United Kingdom
Employment Type
Permanent
Salary
£50,000
Engineer to help improve the reliability, scalability and automation of their AWS estate. This is a hands-on engineering role working across cloud infrastructure, observability, CI/CD and platform tooling, helping development teams deliver faster and more reliably. You'll be joining a collaborative engineering environment with the opportunity … scalable AWS infrastructure. Develop and manage Infrastructure as Code using AWS CDK. Support CI/CD pipelines and deployment automation. Improve monitoring, logging and observability across distributed systems. Support incident management, root cause analysis and platform reliability improvements. Work closely with engineering and architecture teams to improve operational performance ...

Dynatrace Expert

Hiring Organisation
BGTS LTD
Location
London, United Kingdom
Employment Type
Permanent
Salary
£65000 - £80000/annum
Microservices Integration The candidate will be responsible for integrating Dynatrace monitoring within our AWS cloud infrastructure and microservices ecosystem. This includes ensuring seamless observability across containerized environments (e.g., Kubernetes, Docker) and serverless architectures. The expert will collaborate closely with development and DevOps teams to embed monitoring best practices into … system administrators, and project managers. The ability to document monitoring strategies, root cause analyses, and best practices clearly is crucial for maintaining a robust observability culture within the organization. Preferred Qualifications Dynatrace Associate or Professional Certification. Experience with OpenTelemetry (OTEL) implementation. Familiarity with other monitoring and logging tools (e.g., Splunk ...

Lead Production Infrastructure Support/SRE

Hiring Organisation
Hunter Bond
Location
London Area, United Kingdom
Lead and coordinate a global infrastructure engineering function Heavy involvement in process improvement, automation, and tooling Partner with engineering and platform teams to improve observability, self-service, and incident response capabilities Remain hands-on with troubleshooting, outage management, and critical production incidents Mentor and develop engineers across multiple regions within ...

Networking Specialist

Hiring Organisation
Ncounter
Location
London, South East, England, United Kingdom
Employment Type
Full-Time
Salary
£160,000 - £175,000 per annum
essential, alongside confidence working with modern data centre technologies. Nice to Haves: • Experience with automation using Python, Ansible, or similar tools • Exposure to observability and monitoring platforms • Understanding of network security and secure routing design • Hands-on experience with Arista and or Cisco in production environments • Industry certifications such ...

Founding AI Engineer

Hiring Organisation
SR2 | Socially Responsible Recruitment | Certified B Corporation™
Location
London Area, United Kingdom
feedback loops Working across retrieval, context management, agent workflows, and inference optimisation Deploying AI products into production with a strong focus on reliability and observability Collaborating closely with founders, product, and end users to rapidly iterate on features Helping establish engineering best practices, tooling, and technical direction early on What ...

Backend Engineer

Hiring Organisation
Neulinx
Location
City of London, London, United Kingdom
foundations behind their product. This is a high-ownership role in a small, fast-moving team. You’ll work across backend services, deployment infrastructure, observability, incident response, CI/CD, and release reliability, helping the team ship faster and more safely as the company enters its next stage of growth. ...

Bid Solution Architect - LONDON - PART TIME

Hiring Organisation
Reed
Location
Southwark, London, England, United Kingdom
Employment Type
Temporary
Salary
Salary negotiable
ready. Assure the full scheduling system architecture, focusing on performance and resilience. Validate integration assumptions, API patterns, data flows, and control mechanisms. Ensure system observability, failover, and peak-load behavior are credible and evidenced. Design or validate security controls across application, infrastructure, and operations. Ensure alignment of IAM, encryption, logging ...

Senior Data Platform Engineer

Hiring Organisation
ITSS Recruitment
Location
London, United Kingdom
Employment Type
Permanent
Salary
£70000 - £100000/annum Bonus + Fantastic benefits
consistency and reusability across environments. * Build and optimise CI/CD pipelines using Azure DevOps and GitHub Actions to support rapid, reliable deployments. * Implement observability practices including logging, metrics, and alerting using observability tools. * Collaborate with the Lead Engineer and Architects to align implementation with platform standards and patterns. * Provide … Fabric. * Proven experience with infrastructure-as-code using Terraform and building CI/CD pipelines via Azure DevOps and GitHub Actions. * Strong grasp of observability practices, including logging, metrics, alerting, and performance optimisation. * Deep understanding of cloud security, with experience applying secure-by-design principles in Azure and/ ...

Senior DevOps Engineer (Azure / Terraform)

Hiring Organisation
INTEC SELECT LIMITED
Location
London, UK
Employment Type
Full-time
Azure and Terraform expertise, who is comfortable operating in a hands-on capacity, while also mentoring others and driving improvements across CI/CD, observability, and security.Role & Responsibilities Design, build, and manage Azure infrastructure using Terraform, including modules, state management, and pipelines Develop and maintain CI/CD workflows (GitHub … Actions, Azure DevOps or similar) Improve platform reliability, observability, and security across environments Take ownership of infrastructure and deployment processes within a fast-moving delivery team Collaborate closely with engineers to embed DevOps best practices and scalable patterns Mentor team members on infrastructure, automation, and platform engineering principles Identify ...

Senior DevOps Engineer

Hiring Organisation
INTEC SELECT LIMITED
Location
London, UK
Employment Type
Full-time
Azure and Terraform expertise, who is comfortable operating in a hands-on capacity, while also mentoring others and driving improvements across CI/CD, observability, and security.Role & Responsibilities Design, build, and manage Azure infrastructure using Terraform, including modules, state management, and pipelines Develop and maintain CI/CD workflows (GitHub … Actions, Azure DevOps or similar) Improve platform reliability, observability, and security across environments Take ownership of infrastructure and deployment processes within a fast-moving delivery team Collaborate closely with engineers to embed DevOps best practices and scalable patterns Mentor team members on infrastructure, automation, and platform engineering principles Identify ...

Senior DevOps Engineer (Azure / Terraform)

Hiring Organisation
INTEC SELECT LIMITED
Location
City of London, London, England, United Kingdom
Employment Type
Contractor
Contract Rate
£600 - £650 per day
Azure and Terraform expertise, who is comfortable operating in a hands-on capacity, while also mentoring others and driving improvements across CI/CD, observability, and security. Role & Responsibilities Design, build, and manage Azure infrastructure using Terraform, including modules, state management, and pipelines Develop and maintain CI/CD workflows … GitHub Actions, Azure DevOps or similar) Improve platform reliability, observability, and security across environments Take ownership of infrastructure and deployment processes within a fast-moving delivery team Collaborate closely with engineers to embed DevOps best practices and scalable patterns Mentor team members on infrastructure, automation, and platform engineering principles Identify ...

Senior DevOps Engineer

Hiring Organisation
INTEC SELECT LIMITED
Location
London, South East, England, United Kingdom
Employment Type
Contractor
Contract Rate
£550 - £650 per day
Azure and Terraform expertise, who is comfortable operating in a hands-on capacity, while also mentoring others and driving improvements across CI/CD, observability, and security. Role & Responsibilities Design, build, and manage Azure infrastructure using Terraform, including modules, state management, and pipelines Develop and maintain CI/CD workflows … GitHub Actions, Azure DevOps or similar) Improve platform reliability, observability, and security across environments Take ownership of infrastructure and deployment processes within a fast-moving delivery team Collaborate closely with engineers to embed DevOps best practices and scalable patterns Mentor team members on infrastructure, automation, and platform engineering principles Identify ...

Platform Engineer

Hiring Organisation
UA Consulting
Location
City of London, London, United Kingdom
Employment Type
Contract
Contract Rate
From £300 to £400 per day
Platform Engineer with strong site reliability principles to join our Platform team.Youllfocus onmaintainingand improving production reliability, automating operational tasks, and enhancing our observability stack.Youllwork closely with SREs, support engineers, release managers, and incident managers to ensureour systems meet SLIs, SLOs, and SLA targets. Key Responsibilities Maintain and optimise production environments … production workloads (EKS, EC2, RDS/Aurora, S3, IAM). Infrastructure as Code with Terraform and configuration management with Ansible. Strong experience with observability tools (Grafana, Prometheus, Loki, Tempo). Understanding of SRE concepts (SLIs, SLOs, error budgets, capacity planning). Comfortable working in incident and problem management processes. Strong ...

Platform Engineer

Hiring Organisation
UA Consulting
Location
City of London, London, United Kingdom
Employment Type
Permanent
Salary
£75,000
Platform Engineer with strong site reliability principles to join our Platform team.Youllfocus onmaintainingand improving production reliability, automating operational tasks, and enhancing our observability stack.Youllwork closely with SREs, support engineers, release managers, and incident managers to ensureour systems meet SLIs, SLOs, and SLA targets. Key Responsibilities Maintain and optimise production environments … production workloads (EKS, EC2, RDS/Aurora, S3, IAM). Infrastructure as Code with Terraform and configuration management with Ansible. Strong experience with observability tools (Grafana, Prometheus, Loki, Tempo). Understanding of SRE concepts (SLIs, SLOs, error budgets, capacity planning). Comfortable working in incident and problem management processes. Strong ...

AI Platform/ DevOps Engineer

Hiring Organisation
The Portfolio Group
Location
City of London, London, Castle Baynard, United Kingdom
Employment Type
Permanent
Salary
£70000 - £80000/annum + Benefits
Bedrock Knowledge Bases) and embedding pipelines Build and maintain CI/CD pipelines for inference services, retrievers, ingestion workflows, and RAG components Implement observability across AI workloads using CloudWatch, MLflow, and OpenTelemetry - covering latency, throughput, cost, and system health Apply secure-by-design principles including IAM, encryption, network controls … Terraform experience for infrastructure-as-code, provisioning and managing cloud infrastructure at scale Experience operating containerised services, managing CI/CD pipelines, and owning observability and reliability Familiarity with vector databases or search infrastructure (OpenSearch, Algolia) is a strong advantage Python proficiency for scripting, automation, and deploying production services Solid ...

Go Full Stack Developer

Hiring Organisation
itecopeople
Location
London, United Kingdom
Employment Type
Permanent
Salary
£54000 - £61000/annum
event-driven services Contribute to CI/CD pipelines and cloud-native deployments Review code and champion engineering best practices Improve application performance, observability and reliability Collaborate within Agile delivery teams across multiple projects Support technical decision-making and continuous improvement Skills & Experience We are looking for candidates with strong … reviews, testing and engineering governance Experience with any of the following would be highly advantageous: Microsoft Azure Python GitOps tooling (Argo CD/Flux) Observability tooling (Prometheus, Grafana, OpenTelemetry) AI/LLM-enabled applications Event-driven architectures and messaging platforms What's on Offer Opportunity to work on cutting-edge ...

Site Reliability Engineer

Hiring Organisation
Pertemps London
Location
London, United Kingdom
Employment Type
Permanent
Salary
GBP 50,000 Annual
Jenkins, GitLab CI) Develop and maintain Terraform modules for infrastructure-as-code Build automation tools (CLI tools, scripts, GitHub Apps, self-service tooling) Own observability: dashboards, alerts, monitoring, and runbooks Continuously improve platform processes and reduce operational toil What We're Looking For Essential Skills & Experience 2-3 years … GitHub Actions, GitLab CI, Jenkins) Ability to write production-quality code in Python or Bash Solid networking fundamentals (DNS, load balancers, CDNs) Experience with observability tools (NewRelic, Datadog, Prometheus, Grafana) Comfortable participating in on-call rotations Experience using AI tools (e.g. ChatGPT, Copilot, Cursor) to enhance productivity Desirable Go, Ansible ...

Staff Software Engineer

Hiring Organisation
Visa
Location
London, UK
Employment Type
Full-time
standards Apply distributed systems principles includingidempotency and safe retries,failure isolation and graceful degradation,schemaand API versioning Build systems with clear SLAs, SLOs, and observability Maintain a strong security posture across services and data access Data-Intensive & Reporting Systems: Work closely with data engineering teams oncanonical data models,regime-specific … more ofJava, Python (or similar) Strong systemdesign skills acrossAPI-driven architectures,Data-intensive services,Batchand event-driven workflows Deep understanding of reliability, observability, and operational excellence Data & Analytics Awareness: Strong understanding of datamodellingconceptssuch ascanonicalmodelsand dimensional models Experience working alongside modern data platforms (e.g. Snowflake,BigQuery, Redshift) Ability to reason aboutData ...

Platform Engineer

Hiring Organisation
UA Consulting
Location
City, London, United Kingdom
Employment Type
Contract
Contract Rate
GBP 300 - 400 Daily
Platform Engineer with strong site reliability principles to join our Platform team.Youllfocus onmaintainingand improving production reliability, automating operational tasks, and enhancing our observability stack.Youllwork closely with SREs, support engineers, release managers, and incident managers to ensureour systems meet SLIs, SLOs, and SLA targets click apply for full job details ...

Service Architect- 6 month contract

Hiring Organisation
Opus Recruitment Solutions Ltd
Location
London, South East, England, United Kingdom
Employment Type
Contractor
Contract Rate
£400 - £420 per day
Design for a platform-based product delivery model that underpins hundreds of products Use of Dynamic Cis for product discovery and association Integration with observability tools such as Datadog • Familiarity with AWS or Confluent Cloud an advantage Details: 6 months (likely extension) Outside IR35 Fully remote ...

Back End Developer

Hiring Organisation
Insight Global
Location
City of London, London, United Kingdom
development within an enterprise level organization Extensive experience coding and deploying features within AWS serverless environment Experience working with AWS Services Lambda, S3, DynamoDB Observability tools such as Datadog or NewRelic ...

Senior Network Architect, GPU Fabric and AI Infrastructure

Hiring Organisation
We Love Alfa
Location
London, United Kingdom
Employment Type
Permanent
Salary
GBP 180,000 - 240,000 Annual
directly impact customer training workloads. This person will own network architecture across GPU fabric, InfiniBand, RoCE v2, Ethernet leaf spine, edge connectivity, peering, observability, deployment standards and operational handover. We are looking for someone who has: Deep GPU cluster or HPC deployment experience Strong InfiniBand production experience RoCE v2 experience ...