Observability Jobs in England

476 to 500 of 687 Observability Jobs in England

Founding Engineer

City of London, London, United Kingdom
Generative
complex data ecosystem Design flexible data ingestion and transformation pipelines for financial market data and trading systems Build and maintain AI/ML infrastructure, including model serving, evaluation, and observability frameworks Collaborate directly with clients to ensure the platform meets real-world enterprise requirements Contribute to both strategic technical direction and hands-on implementation as part of a small, high More ❯
Posted:

Senior React Native Developer

City of London, London, United Kingdom
Venator Recruitment
Integrate and extend React Native functionality using native modules in Swift or Kotlin when needed. Take ownership of app store releases, ensuring smooth submission, updates and maintennance processes. Manage observability and performance in production through tools like crash reporting, logging and analytics. Contribute to the architecture and tooling decisions, helping to shape the direction of our mobile stack from the More ❯
Posted:

Senior React Native Developer

London Area, United Kingdom
Venator Recruitment
Integrate and extend React Native functionality using native modules in Swift or Kotlin when needed. Take ownership of app store releases, ensuring smooth submission, updates and maintennance processes. Manage observability and performance in production through tools like crash reporting, logging and analytics. Contribute to the architecture and tooling decisions, helping to shape the direction of our mobile stack from the More ❯
Posted:

Strategy Manager

london (chessington), south east england, united kingdom
Hybrid/Remote Options
SoTalent
Resource Management: Oversee project budgets, allocate resources, and lead the monitoring engineering team to deliver on time and within scope. What You'll Bring Strong expertise in system reliability, observability, and monitoring strategy. Deep understanding of end-to-end video processing and broadcast workflows. Proven leadership skills with experience managing engineering teams. Strategic mindset with the ability to align monitoring More ❯
Posted:

Network Engineer Team Lead

Leatherhead, England, United Kingdom
Cisilion
team meetings and performance reviews. Motivate, guide and coach team members to hit agreed targets via formal objectives and supporting development plans. Incident & Problem Management Proactively partner with the Observability Management function to establish trending and opportunities for Customer infrastructure optimisation. Be a process manager and advocate for Problem Management, ensuring root cause analysis takes place on major Incidents and More ❯
Posted:

Solutions Manager

England, United Kingdom
Salt
TM Forum (eTOM/ODA) and ITIL 4. Strong delivery leadership in Agile/SAFe environments, combined with governance and risk management. Demonstrated success implementing Operational Readiness Reviews (ORR), observability, and support models. Excellent stakeholder management and ability to produce clear written artefacts for both technical and executive audiences. Education Bachelor’s or Master’s degree in Computer Science, Engineering More ❯
Posted:

Chief Technology Officer

London Area, United Kingdom
Understanding Recruitment
end Scale ingestion and indexing for 30+ blockchains, including high-throughput chains Operate a secure fleet of full nodes and indexers with clear SLAs and cost controls Set SLOs, observability, incident management, and make on call boring Build and lead six plus squads. Org design, hiring, mentoring, standards, and SDLC Partner with product, compliance, and customers to turn outcomes into More ❯
Posted:

Chief Technology Officer

City of London, London, United Kingdom
Understanding Recruitment
end Scale ingestion and indexing for 30+ blockchains, including high-throughput chains Operate a secure fleet of full nodes and indexers with clear SLAs and cost controls Set SLOs, observability, incident management, and make on call boring Build and lead six plus squads. Org design, hiring, mentoring, standards, and SDLC Partner with product, compliance, and customers to turn outcomes into More ❯
Posted:

Datacenter & Virtualisation Engineer

london, south east england, united kingdom
Rimes Technologies
cable management, hardware lifecycle planning, and environmental monitoring. Participate in capacity planning and performance tuning to support business growth and infrastructure scalability. Reliability & Monitoring Ensure high availability, security, and observability of systems through best practices in reliability and recoverability. Develop and maintain monitoring systems to ensure compliance with service level objectives. Lead and contribute to incident response, root cause analysis More ❯
Posted:

AI Engineer - Infrastructure

City of London, London, United Kingdom
causaLens
business problems at scale. What you’ll bring: Expertise in the deployment of enterprise-grade AI solutions to cloud and on-premise customer environments with a focus on availability, observability and security. Proven track record with at least one of the major cloud providers and an understanding of DevOps best practices. Hands-on experience building production-grade solutions using LLMs More ❯
Posted:

AI Engineer - Infrastructure

London Area, United Kingdom
causaLens
business problems at scale. What you’ll bring: Expertise in the deployment of enterprise-grade AI solutions to cloud and on-premise customer environments with a focus on availability, observability and security. Proven track record with at least one of the major cloud providers and an understanding of DevOps best practices. Hands-on experience building production-grade solutions using LLMs More ❯
Posted:

Lead Data Architect (Governance)

London, United Kingdom
Hybrid/Remote Options
Booksy Inc
best practices It will also help you to have Experience establishing and enforcing data governance standards through technical architecture (not just documentation) Familiarity with data cataloging, metadata management, and observability tools A systems-thinking mindset-you understand the full data lifecycle and how to maintain integrity from source to dashboard At Booksy, we believe in the power of well-structured More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Machine Learning Engineer - Pre-Training

England, United Kingdom
Hybrid/Remote Options
Futureshaper.com
training jobs to identify their bottlenecks, e.g. using NVIDIA Nsight Systems Design and implement efficiency improvements to maximise MFU, e.g. tensor parallelism, model compilation, mixed precision Design and implement observability tools, e.g. to track MFU Collaborate closely with Research teams to integrate training efficiency improvements and create a culture of performance optimization About you In order to set you up More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Problem Manager

City of London, London, United Kingdom
Peregrine
ensure accuracy and quality is obtained collaboratively with our 3rd party suppliers. Assess and manage risks associated with services and recurring problems. Work across the ecosystem to continuously improve observability capabilities such as reporting, dashboarding and alerting which will drive robust proactive problem management. Ensure, these are communicated on a weekly basis and available for all of the team to More ❯
Employment Type: Permanent
Posted:

Senior Front End Engineer

Ringway, Altrincham, Cheshire, England, United Kingdom
The Hut Group
potential technical risks and develop strategies to mitigate them, ensuring that the application is secure, robust and reliable Champion performance optimisation across the frontend stack while ensuring accessibility and observability are baked into all solutions Deeply committed to crafting intuitive, impactful, and optimised user experiences that turn complex workflows into seamless, engaging journeys Share your knowledge within a democratic team More ❯
Employment Type: Full-Time
Salary: Competitive salary
Posted:

Principal Product Manager - Technical

London Area, United Kingdom
IDC
platform. Define the data models, technical architecture, and platform interfaces that power intelligent, context-aware product experiences. Partner with engineering to design and deliver scalable APIs, system components, and observability layers that enable extensibility and reuse. Collaborate with AI/ML teams to integrate capabilities such as semantic search, LLM-powered assistants, personalization, and classification systems. Write PRFAQs and technical More ❯
Posted:

Principal Product Manager - Technical

City of London, London, United Kingdom
IDC
platform. Define the data models, technical architecture, and platform interfaces that power intelligent, context-aware product experiences. Partner with engineering to design and deliver scalable APIs, system components, and observability layers that enable extensibility and reuse. Collaborate with AI/ML teams to integrate capabilities such as semantic search, LLM-powered assistants, personalization, and classification systems. Write PRFAQs and technical More ❯
Posted:

Director, Infrastructure & Security Operations

Chelmsford, Essex, United Kingdom
Hybrid/Remote Options
Brooks Automation, Inc
infrastructure and security services, ensuring operational excellence and incident response readiness. Partner with the CISO to shape long-term strategy and roadmap for secure, resilient IT services. Drive automation, observability, and scalability across the infrastructure and security stack. Serve as a key escalation point for technical troubleshooting and security event resolution. Guide vendor selection, contract negotiations, and service-level adherence More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

DevOps Engineer

Birmingham, England, United Kingdom
Explore Group
next-generation AI products. You’ll join a small, experienced team developing an internal Kubernetes-based platform that enables AI innovation across the organisation automating everything from deployments to observability, and helping developers build smarter applications with confidence. What you’ll be doing: Designing, deploying, and maintaining Azure Kubernetes (AKS) environments Managing Infrastructure as Code with Terraform and improving GitOps … workflows (ArgoCD/GitHub Actions) Building observability and monitoring stacks using Prometheus, Grafana, and Loki Supporting AI workloads (LLMs, RAG, and document processing applications) running on Kubernetes Automating platform operations with Python, Go, and shell scripting Implementing security guardrails, PII compliance tooling, and best practices for production AI systems What you’ll need: 3+ years’ experience in DevOps or Platform … Engineering Strong background in Azure and Kubernetes Hands-on experience with Terraform, CI/CD, and container orchestration Familiarity with observability tools (Prometheus, Grafana, Loki) Scripting or programming skills in Python or Go Interest in AI infrastructure, LLMOps, or large language model deployment More ❯
Posted:

Site Reliability Engineer

Wigan, Lancashire, England, United Kingdom
Hybrid/Remote Options
Searchability
As part of their continued investment in reliability and platform performance, they are now seeking an experienced Site Reliability Engineer to strengthen their engineering function and help evolve their observability and automation capabilities. THE BENEFITS Hybrid working model (office and remote) Opportunity to define and lead SRE strategy within a collaborative culture Exposure to modern cloud-native and containerised environments … and performance of complex online platforms supporting high-volume transactions. Working closely with operations and product teams, you'll monitor production systems, develop automation to improve uptime, and refine observability to provide real-time insight into platform health. You'll also play a key role in performance testing, system tuning and incident management to ensure smooth operation during critical events. … SITE RELIABILITY ENGINEER ESSENTIAL SKILLS At least 2 years' experience working as an SRE Deep understanding of system reliability, scalability and performance tuning Experience with observability tools (Grafana, Prometheus, OpenTelemetry) Proficiency in a programming language such as Go or .NET for automation and debugging Hands-on experience with AWS or another major cloud platform Knowledge of Kubernetes, Terraform, and Infrastructure More ❯
Employment Type: Full-Time
Salary: £40,000 per annum
Posted:

Site Reliability Engineer

Wigan, Greater Manchester, United Kingdom
Hybrid/Remote Options
Searchability (UK) Ltd
As part of their continued investment in reliability and platform performance, they are now seeking an experienced Site Reliability Engineer to strengthen their engineering function and help evolve their observability and automation capabilities. THE BENEFITS Hybrid working model (office and remote) Opportunity to define and lead SRE strategy within a collaborative culture Exposure to modern cloud-native and containerised environments … and performance of complex online platforms supporting high-volume transactions. Working closely with operations and product teams, you'll monitor production systems, develop automation to improve uptime, and refine observability to provide real-time insight into platform health. You'll also play a key role in performance testing, system tuning and incident management to ensure smooth operation during critical events. … SITE RELIABILITY ENGINEER ESSENTIAL SKILLS At least 2 years' experience working as an SRE Deep understanding of system reliability, scalability and performance tuning Experience with observability tools (Grafana, Prometheus, OpenTelemetry) Proficiency in a programming language such as Go or .NET for automation and debugging Hands-on experience with AWS or another major cloud platform Knowledge of Kubernetes, Terraform, and Infrastructure More ❯
Employment Type: Permanent
Salary: £40000/annum
Posted:

Site Reliability Engineer

Hereford, Herefordshire, England, United Kingdom
Hybrid/Remote Options
Hays Specialist Recruitment Limited
role focused on ensuring service availability, performance, and cost-efficiency across both cloud and on-prem infrastructure.You'll work closely with development and support teams to evolve infrastructure, enhance observability, and proactively mitigate reliability risks.Key Responsibilities:Collaborate with software engineers to improve reliability and performanceAutomate operational tasks and reduce alert fatigueEnhance monitoring and observability to pre-empt issuesSupport development environments … protocolsExperience with cloud platforms, ideally AWS (EC2, RDS, S3, Lambda)Desirable:Coding experience in Java, Go, Python or similarKnowledge of cross-domain technologiesExperience in service management environmentsPractical application of observability patternsExperience with AzureAdditional Information:Due to the nature of the work, successful candidates will be required to undergo security vetting.We welcome applications from all backgrounds and are committed to creating More ❯
Employment Type: Contractor
Rate: £500 - £600 per day
Posted:

Platform Engineer

City of London, London, United Kingdom
Humankind Global Recruitment
ll design and implement database services that can be consumed on demand — secure, compliant, and self-service. Working closely with Platform, SRE, and DevOps teams, you’ll bring automation, observability, and scalability to their database layer, enabling hundreds of developers to ship faster with confidence. What You’ll Do 💾 Design, build, and operate PostgreSQL and ElasticSearch clusters for production. ⚙️ Automate … provisioning, upgrades, and HA/DR with Terraform, Ansible, Helm, and Kubernetes Operators. 🌐 Embed databases into the Internal Developer Platform through APIs, GitOps workflows, and self-service tools. 📊 Implement observability with Prometheus, Grafana, and centralized logging. 🧠 Define and maintain SLOs for uptime and performance, embedding compliance and security controls. 🤝 Collaborate with development and platform teams to refine database automation standards … of Kubernetes and stateful workloads . ✅ Proficiency with Infrastructure as Code (Terraform, Ansible, Helm). ✅ Some development experience (Python, Go, or similar) for automation and API integration. ✅ Knowledge of observability tooling – Prometheus, Grafana, ELK, or Datadog. 🎁 Bonus: experience with ElasticSearch , MySQL , or SQL Server , plus exposure to AWS , GCP , or Azure . Why This Role ✨ Greenfield impact – build database-as More ❯
Posted:

Platform Engineer

London Area, United Kingdom
Humankind Global Recruitment
ll design and implement database services that can be consumed on demand — secure, compliant, and self-service. Working closely with Platform, SRE, and DevOps teams, you’ll bring automation, observability, and scalability to their database layer, enabling hundreds of developers to ship faster with confidence. What You’ll Do 💾 Design, build, and operate PostgreSQL and ElasticSearch clusters for production. ⚙️ Automate … provisioning, upgrades, and HA/DR with Terraform, Ansible, Helm, and Kubernetes Operators. 🌐 Embed databases into the Internal Developer Platform through APIs, GitOps workflows, and self-service tools. 📊 Implement observability with Prometheus, Grafana, and centralized logging. 🧠 Define and maintain SLOs for uptime and performance, embedding compliance and security controls. 🤝 Collaborate with development and platform teams to refine database automation standards … of Kubernetes and stateful workloads . ✅ Proficiency with Infrastructure as Code (Terraform, Ansible, Helm). ✅ Some development experience (Python, Go, or similar) for automation and API integration. ✅ Knowledge of observability tooling – Prometheus, Grafana, ELK, or Datadog. 🎁 Bonus: experience with ElasticSearch , MySQL , or SQL Server , plus exposure to AWS , GCP , or Azure . Why This Role ✨ Greenfield impact – build database-as More ❯
Posted:

DevOps Engineer – Global Multi-Strategy Hedge Fund – Industry Leading Comp Package

London Area, United Kingdom
Mondrian Alpha
in London. Working alongside software and cybersecurity engineers, you’ll help design, build, and automate a hybrid multi-cloud estate across AWS and Azure—enhancing CI/CD pipelines, observability, and developer experience. You’ll take ownership of business-critical infrastructure, shaping cloud strategy end-to-end and collaborating with global teams across the US and Europe to drive efficiency … CI/CD pipelines through tools such as Azure DevOps, GitHub Actions, or Octopus. You’ll also be adept at automating workflows in Python or PowerShell and implementing modern observability solutions including DataDog, OpenSearch, and LogicMonitor. This is a rare opportunity to join a high-performing, global hedge fund where technology and engineering directly drive investment performance and operational scale. More ❯
Posted:
Observability
England
10th Percentile
£56,250
25th Percentile
£67,500
Median
£80,000
75th Percentile
£105,000
90th Percentile
£146,000