551 to 575 of 658 Observability Jobs in the UK

Site Reliability Engineer

Hiring Organisation: Connells Limited
Location: Milton Keynes, Buckinghamshire, UK
Employment Type: Full-time

Job Description We are seeking an experienced Site Reliability Engineer (SRE) to join our Group Technology Team in Milton Keynes.ConnellsX is Connells Group Technologys internal developer platform, built on Microsoft Azure. It simplifies cloud hosting ...

Field CTO EMEA

Hiring Organisation: Jobleads-UK
Location: Maidenhead, England, United Kingdom

Engineering, platform teams, and business stakeholders.Translate customer business goals into compelling transformation strategies powered by Dynatrace.Lead high-impact technical discovery and executive conversations around observability, cloud modernization, AI adoption, security, automation, and business outcomes.Shape account strategy with Sales and Solution Engineering teams for complex, multi-stakeholder deals.Develop board-level … executive-level narratives that connect platform capabilities to risk reduction, operational excellence, digital experience, and growth.Guide customers on modern observability and security operating models, including platform engineering, SRE, DevSecOps, and AI-assisted operations.Support large opportunities by validating architecture direction, differentiation, value realization, and long-term platform vision.Influence go-to-market ...

Lead Splunk Engineer

Hiring Organisation: Meritus
Location: London, United Kingdom
Employment Type: Contract
Contract Rate: £500 - £600/day

MERITUS are recruiting for a Splunk Lead Engineer to join a Consulting Organisation working into a Central Government Client supporting enterprise-wide observability and monitoring capabilities. This is a 12-month contract role based in London, paying £600 per day (Inside IR35), with 2 days per week required on-site. … SPLUNK LEAD ENGINEER - OBSERVABILITY & MONITORING - LONDON (HYBRID) - 12-MONTH CONTRACT - £600 PER DAY (INSIDE IR35) - SC CLEARANCE REQUIRED As a Splunk Lead Engineer, you will act as the technical authority for monitoring and observability, driving standards, automation, and scalable solutions across a complex enterprise environment. You will work closely with ...

Observability Engineering Manager

Hiring Organisation: Jobleads-UK
Location: Douglas, Northern Ireland, United Kingdom

interesting locations around the world, to align on strategy and execution. The company is founder‐led, profitable, and growing. We are hiring an Observability Engineering Manager who will lead the development of the distributed tracing or service mesh products as part of our Observability group. Engineering managers at Canonical … review and lead both architecture and code. They are astute judges of character, set expectations, and hold colleagues accountable. We are building an observability stack that is easy to deploy and operate on Kubernetes. This is part of a broader initiative to deliver the world's best suite of open ...

SRE - Site Reliability Engineer - Observability & Performance

Hiring Organisation: Sanderson Recruitment
Location: Bristol, Somerset, United Kingdom
Employment Type: Contract
Contract Rate: GBP 550 - 600 Daily

Observability and Performance Up to £600 per day outside IR35 6 month initial contract Bristol - Largely remote I'm currently working with a client who is looking for an SRE to implement and enhance observability across Java applications, middleware and Linux infrastructure using Grafana click apply for full job details ...

GCP DevOps

Hiring Organisation: Pracyva ltd
Location: Bristol, City of Bristol, United Kingdom
Employment Type: Contract
Contract Rate: £400 - £425/day

Actions, Harness, Jenkins). Networking & Security: Experience with GCP Cloud Armor, GCP Networking, and embedding secure-by-design controls from design to runtime. Automation & Observability: Implementing actionable observability, performance tuning, and automation to reduce toil. Defining and operating against SLOs/SLIs. Scripting & Tooling: Scripting in Bash, PowerShell, or Python. ...

DevOps Engineer

Hiring Organisation: Fruition Group
Location: Leeds, West Yorkshire, Yorkshire, United Kingdom
Employment Type: Contract

Contract: Inside IR35 We're seeking an experienced Senior DevOps Engineer to join a small, highly skilled engineering team delivering a large-scale enterprise observability platform as they move away from Splunk This is an opportunity to work on a critical cloud platform supporting the migration of numerous services onto … modern monitoring and logging solution. What you'll be doing * Support and enhance a large-scale observability platform. * Help engineering teams onboard and migrate their services. * Build and maintain dashboards, log pipelines and alerting. * Develop and manage cloud infrastructure using Terraform across Azure and AWS. * Produce technical documentation and operational ...

Data Reliability Engineer

Hiring Organisation: Ashdown Group
Location: City of London, London, United Kingdom
Employment Type: Permanent, Work From Home
Salary: £95,000

work from home 2 days per week. This is a high-impact role focused on improving data quality, reducing incidents, and building scalable observability across a modern enterprise data platform. Youll help ensure data across the organisation is accurate, reliable, and trusted for critical business decision-making. Youll take ownership … style roles, with strong SQL and Python skills and experience working in modern cloud-based data environments. Hands-on experience with data observability tools such as Grafana, Monte Carlo, or Acceldata, and data governance/quality platforms like Informatica, Collibra or Microsoft Purview is highly desirable. Experience within the Azure ...

Software Engineer, GPU Infrastructure- ChatGPT Engineering

Hiring Organisation: Jobleads-UK
Location: Greater London, England, United Kingdom

large-scale GPU infrastructure supporting ChatGPT inference. Build internal platforms, tooling, and AI-powered agents that automate fleet operations and reduce operational overhead. Improve observability, reliability, and operational efficiency across thousands of GPUs. Develop systems for capacity planning, scheduling, fleet health monitoring, and incident response. Identify infrastructure bottlenecks and implement … software that automates operational workflows rather than relying on manual processes. Have experience with Kubernetes, Linux systems, container orchestration, or distributed infrastructure. Understand infrastructure observability, monitoring, capacity planning, and incident management. Enjoy identifying cross-team pain points and building reusable platforms that improve developer productivity. Are comfortable working across software ...

Senior Python Backend Engineer Fully Remote, UK

Hiring Organisation: Interact Consulting Limited
Location: South West London, London, United Kingdom
Employment Type: Permanent, Work From Home

APNs/FCM), user notification preferences, audience segmentation, and delivery tracking. Integrate with third-party data providers and external services, ensuring robust failure handling, observability, and system resilience. Design and support secure internal tooling APIs, including role-based access controls, audit trails, change history, and safe administrative workflows. Build … shape technical direction and the expectation to take real ownership of what you build. Scope: backend services, infra, event-driven systems, CI/CD, observability, all built for live-event traffic. Python-first, Postgres, Redis. You'll own your services fully: building them and keeping them running in production. Heavily ...

Director, Generative AI Experience Engineering (EMEA)

Hiring Organisation: Jobleads-UK
Location: Greater London, England, United Kingdom

contextual generation, retrieval and adaptive orchestration. Work across modern LLM ecosystems including foundation models, retrieval pipelines, vector storage, embeddings, multi‐agent systems and AI observability tooling. Design systems that support conversational rendering, dynamic content assembly, streaming UX and adaptive interfaces. Actively experiment with AI tooling, workflows and engineering methodologies, bringing … models RAG pipelines, vector databases, embeddings and chunking Tool calling, agentic systems, orchestration and memory frameworks Prompt engineering, prompt management and context engineering AI observability, evaluations, governance and guardrails AI‐Augmented Engineering AI‐assisted development tooling including Claude Code, Cursor and Codex Prompt‐driven engineering and AI pair‐programming workflows ...

AI Operations Engineer (Python)

Hiring Organisation: Jobleads-UK
Location: Greater London, England, United Kingdom

requirements evolve. Act as the point of contact for AI system issues, triaging, diagnosing, and resolving incidents while keeping stakeholders informed throughout. Own monitoring, observability, and quality across the AI estate, going beyond uptime to track health signals specific to agents, such as output quality, model and prompt regressions … understanding of Agile delivery in large‐scale enterprise environments. Experience supporting or operating production systems, ideally AI driven or data intensive, with strong monitoring, observability, and distributed systems diagnosis skills. Practical experience with cloud (particularly AWS), relational databases such as Postgres, and familiarity with container orchestration or PaaS such ...

Technical Lead Edge Platform

Hiring Organisation: VoCoVo
Location: Oxfordshire, United Kingdom
Employment Type: Full Time
Salary: 80000 to 85000 GBP Annually

MicroK8s). Experience with image build tooling and immutable OS concepts, familiarity with tools such as Kairos, OSTree is highly desirable. Practical exposure to observability at scale, including metrics, logging, alerting (Prometheus, Grafana, Loki) and hands-on experience with OpenTelemetry. Experience operating or building infrastructure to manage, monitor and update … implement secure, reliable over-the-air (OTA) update mechanisms for OS and workload delivery at scale. Take ownership of the edge platform's observability, reliability and security, including driving adoption of OpenTelemetry across the edge estate. Contribute to the technical roadmap, researching new approaches and producing demonstrations and proofs ...

Principal Observability & Cloud Platform Engineer

Hiring Organisation: 17918
Location: Cambridge, Cambridgeshire, United Kingdom

Principal Observability & Cloud Platform Engineer Most observability engineers run someone else's stack. This role is for the person who builds it. Our client is re-architecting observability and cloud infrastructure at a scale very few engineers ever touch: a 3,000-node Kubernetes estate, 50TB of logs ...

Network Monitoring & Observability Architect

Hiring Organisation: Pontoon
Location: Chester, Cheshire, United Kingdom
Employment Type: Contract

Join Our Team as a Network Monitoring & Observability Architect ! Contract Length: 12 months Location: Chester Working Pattern: 3 days per week in the office, Via Umbrella Company Are you ready to take your skills to the next level? We're looking for a talented Monitoring Architect to join our dynamic ...

Senior Infrastructure Engineer — Cloud, IaC & Observability

Hiring Organisation: Jobleads-UK
Location: Tipton, England, United Kingdom

through strong infrastructure practices. The ideal candidate has over 7 years of experience in platform and DevOps roles, strong skills in IaC, networking, and observability, and a passion for AI safety. #J-18808-Ljbffr ...

AI Platform Engineer: Scale, Security & Observability

Hiring Organisation: Jobleads-UK
Location: City Of London, England, United Kingdom

pipelines and production-grade AI products at scale. You will help design, build, and operate cloud-native infrastructure across AWS and Databricks, ensuring scalability, observability, security, and cost efficiency. You will collaborate with AI engineers to translate prototypes into production-ready platform components, enabling reliable deployment #J-18808-Ljbffr ...

Lead Product Manager AIOPs

Hiring Organisation: Jobleads-UK
Location: City Of London, England, United Kingdom

About the Role Grade Level (for internal use): 11. The AIOps team is responsible for modernizing IT operations through intelligent observability, event correlation, anomaly detection, predictive insights, and automation. They partner with AIOps vendors, IT Operations, infrastructure, platform engineering, SRE, service management, and application teams to reduce operational noise, improve … accelerate incident response across a complex enterprise environment. Responsibilities Execute the enterprise AIOps strategy, operating model, and adoption plan across the organization. Build scalable observability and automation capabilities and lead cross‐functional teams to improve reliability, operational efficiency, and service outcomes. Execute the enterprise AIOps roadmap, aligning delivery plans ...

Principal Machine Learning Engineer

Hiring Organisation: Jobleads-UK
Location: Greater London, England, United Kingdom

data engineering teams to implement scalable data lakehouse oriented feature architectures and enterprise‐grade ML governance. Champion engineering standards for model quality, documentation, observability, and platform resilience. Feature Engineering & Data Architecture Architect highly scalable, production‐ready feature pipelines within Lakehouse environments. Set the technical direction for fallback and resilience strategies … including scoring metrics, latency, error analytics, and SLOs. Partner with platform teams to optimise cost, scale, and reliability of inference endpoints. Monitoring, Drift Detection & Observability Define observability standards for feature drift, concept drift, performance degradation, and data integrity. Lead the creation of dashboards, benchmarks, and automated alerting across ...

Director of Software Engineering - Executive Director

Hiring Organisation: Jobleads-UK
Location: City Of London, England, United Kingdom

reins and drive impact, we’ve got an opportunity just for you. As a Director of Software Engineeringat JPMorgan Chase within theEngineer's Observability Platforms team, you lead a technical area and drive impact within teams, technologies, and projects firm wide. Utilize your in-depth knowledge of software, applications, technical … teams and be a driver ofAIOps innovation and solution delivery. Job responsibilities Leads technology and process implementations to achieve functional technology objectives in the Observability Platforms space, providing essential services for Site Reliability Engineers, Operations and Engineers across the whole firm Delivers technical solutions that can be leveraged across multiple ...

Site Reliability Engineer

Hiring Organisation: Lorien
Location: Edinburgh, Midlothian, Scotland, United Kingdom
Employment Type: Contractor
Contract Rate: Salary negotiable

production incidents, taking ownership through to resolution. Focus on incident response, service restoration and operational excellence (approximately 70% of the role). Improve system observability, monitoring and alerting capabilities. Work closely with development teams to enhance the reliability and operability of applications. Analyse production issues and identify opportunities for automation … Production Engineering or a similar operational engineering role. Strong hands-on experience supporting live production environments. Excellent troubleshooting and incident management skills. Experience with observability and monitoring platforms, including: Grafana Open Telemetry Splunk Good understanding of cloud platforms (AWS experience preferred). Strong knowledge of APIs and API troubleshooting. Experience ...

Senior Engineering Manager, Developer Experience

Hiring Organisation: Jobleads-UK
Location: Greater London, England, United Kingdom

About the team The Developer Experience team owns the internal platform that every the company engineer touches daily: CI/CD pipelines, observability tooling, our developer portal, and an emerging AI platform. It's a high-visibility role: the work you lead directly shapes the productivity of hundreds of engineers … looks like at the company as we scale. What you'll do Lead and develop a growing team of 5+ highly motivated engineers across observability, CI/CD, developer portal (Backstage), and FinOps tooling — setting clear priorities and establishing strong ways of working. Own and evolve the technical roadmap across ...

Platform Operations Director

Hiring Organisation: Jobleads-UK
Location: Greater London, England, United Kingdom

business continuity across the group. Internal IT & Systems Manages internal IT and business systems administration (M365, NetSuite, SuccessFactors, SharePoint) —infrastructure, integrations, and IAM. Ensures observability and SRE capability is fit for purpose across cloud, hosted, and end-user environments. Vendor & Cost Management Drives cloud and vendor cost discipline — manages …/CD infrastructure requirements. Head of Infrastructure & Cloud — Direct report. Hosting strategy, cloud platform, and FinOps execution. Head of SRE — Direct report. Observability, on-call, and DR/BCP processes. Head of Internal Services — Direct report. Internal IT, business systems, and end-user support. Finance — Direct report. Cloud cost visibility ...

Platform Modernisation Lead

Hiring Organisation: Adecco
Location: London, United Kingdom
Employment Type: Contract
Contract Rate: £800 - £900/day

role demands strong leadership and a strategic mindset as you define and embed our cloud operating model, aligning it with change and release management, observability, monitoring, alerting, and support processes. Key Responsibilities: Lead the design and implementation of a hybrid multi-cloud container platform across Azure, AWS, and GCP. Ensure … corporate governance processes, standards, and tooling. Define and embed a robust cloud operating model that aligns with organisational change and release management. Develop observability, monitoring, and alerting strategies to ensure operational excellence. Maintain end-to-end accountability for platform production readiness, ensuring it meets enterprise standards. Support and enable ...

Senior Developer - ~Perm - Birmingham

Hiring Organisation: INFUSED SOLUTIONS LIMITED
Location: Birmingham, West Midlands, United Kingdom
Employment Type: Permanent
Salary: £80,000

recurring technical problems and implementing long-term solutions. Improving platform reliability, resilience, and overall product quality. Performing application profiling, performance tuning, and optimisation. Enhancing observability, monitoring, alerting, and diagnostic capabilities. Working with engineering teams to improve development practices and technical standards. Reducing technical debt and identifying opportunities for platform improvement. … Strong communication skills and the ability to collaborate effectively across engineering teams. Desirable Experience working on SaaS platforms or cloud-based applications. Exposure to observability and monitoring tools. Experience with performance profiling and optimisation techniques. Knowledge of scalability, resilience, and reliability engineering principles. Familiarity with CI/CD pipelines ...