651 to 675 of 1,207 Permanent Observability Jobs

Lead Engineer (AI Native)

Hiring Organisation
Jobleads-UK
Location
Leeds, England, United Kingdom
ensuring decisions and changes are traceable and explainable. Build and coach quality from the start by applying AI to strong foundational techniques such as observability, verification and build automation. Help clients make deliberate AI‐focused technology and tooling choices that avoid unnecessary lock‐in and allow delivery approaches to evolve ...

Lead Engineer (AI Native)

Hiring Organisation
Jobleads-UK
Location
City of Edinburgh, Scotland, United Kingdom
ensuring decisions and changes are traceable and explainable. Build and coach quality from the start by applying AI to strong foundational techniques such as observability, verification and build automation. Help clients make deliberate AI‐focused technology and tooling choices that avoid unnecessary lock‐in and allow delivery approaches to evolve ...

Lead Engineer (AI Native)

Hiring Organisation
Jobleads-UK
Location
Greater London, England, United Kingdom
ensuring decisions and changes are traceable and explainable. Build and coach quality from the start by applying AI to strong foundational techniques such as observability, verification and build automation. Help clients make deliberate AI‐focused technology and tooling choices that avoid unnecessary lock‐in and allow delivery approaches to evolve ...

Staff Storage Engineer

Hiring Organisation
Crusoe
Location
San Francisco, California, United States
Employment Type
Permanent
Salary
USD Annual
premise data environments. Strong understanding of storage architectures (block, file, object) and I/O paths. Hands on experience with performance benchmarking and observability tools (FIO, ElBencho, blktrace, nvme-cli,nfs-gaze, eBPF, etc.). Experience with SSDs, NVMe, RAID, caching, or distributed storage systems. Deep familiarity with enterprise flash ...

Site Reliability Engineer III (Tue - Sat)

Hiring Organisation
Jobleads-UK
Location
Belfast, Northern Ireland, United Kingdom
reliability initiatives, and act as a mentor to junior colleagues, helping to shape the team's technical direction.### ### **Key Responsibilities*** **Own Observability:** Design, build, and refine monitoring, alerting, and observability solutions. Drive the continuous improvement of our SLIs & SLOs to enable faster issue detection and resolution.* **Drive Reliability Projects ...

Site Reliability Engineer

Hiring Organisation
Computappoint
Location
City Of London, England, United Kingdom
where reliability genuinely isn't optional. The role blends application support, platform engineering and SRE practice. It suits someone who leans toward automation and observability over reactive firefighting. Responsibilities: Managing OpenShift and Kubernetes clusters across physical, virtual, and containerised environments Operating observability stacks ( Grafana , Prometheus, Splunk) and driving proactive monitoring … call rotation Key Requirements: Hands-on Kubernetes and/or OpenShift experience in production Scripting skills in Python , Bash, or PowerShell Familiarity with observability tooling and SRE principles SQL and database knowledge (MySQL, Oracle, or similar) Experience supporting .NET, Java, or microservices applications It would be great ...

Site Reliability Engineer

Hiring Organisation
VIQU IT
Location
United Kingdom, Whitechapel, Greater London
Employment Type
Permanent
Salary
£40000 - £50000/annum
Engineer to help improve the reliability, scalability and automation of their AWS estate. This is a hands-on engineering role working across cloud infrastructure, observability, CI/CD and platform tooling, helping development teams deliver faster and more reliably. You’ll be joining a collaborative engineering environment with the opportunity … scalable AWS infrastructure. Develop and manage Infrastructure as Code using AWS CDK. Support CI/CD pipelines and deployment automation. Improve monitoring, logging and observability across distributed systems. Support incident management, root cause analysis and platform reliability improvements. Work closely with engineering and architecture teams to improve operational performance ...

Senior Backend Engineer

Hiring Organisation
SecurityHQ
Location
London, England, United Kingdom
versioned and user-friendly API contracts. Participate in architecture design, code reviews and technical discussions, contributing to overall engineering quality and standards. Quality, Testing & Observability Build and maintain comprehensive test suites including unit, integration, contract and end-to-end testing. Ensure services are fully instrumented with logging, metrics and tracing … support observability in production. Treat testing, monitoring and CI signals as essential components of delivery. Agile Delivery & Continuous Improvement Contribute to agile ceremonies including refinement, estimation and retrospectives. Support continuous improvement across engineering practices, ways of working and use of AI-assisted development tools. Technical Experience & Skills Essential ...

Platform Engineer - Kubernetes / Azure

Hiring Organisation
Keystone Recruitment Partners Ltd
Location
United Kingdom
Employment Type
Permanent
Salary
GBP 450 - 550 Daily
operation of enterprise cloud-native services. The role will focus on Kubernetes-based platforms running on Microsoft Azure, including service mesh, application deployment, observability, security, and production support. Key responsibilities: Build, configure, and support Kubernetes environments on Azure, including AKS. Deploy and manage Spring Boot applications in containerized environments. Work … Ability to work independently and communicate effectively with both technical and non-technical stakeholders. Desirable experience: Terraform, Helm, GitOps, Azure DevOps, or GitHub Actions. Observability tools such as Prometheus, Grafana, OpenTelemetry, or Azure Monitor. Financial services or other regulated enterprise environments. ...

Platform Engineer - Kubernetes / Azure

Hiring Organisation
Keystone Recruitment Partners Ltd
Location
Nationwide, United Kingdom
Employment Type
Permanent, Contract
Salary
£450 - £550/day
operation of enterprise cloud-native services. The role will focus on Kubernetes-based platforms running on Microsoft Azure, including service mesh, application deployment, observability, security, and production support. Key responsibilities: Build, configure, and support Kubernetes environments on Azure, including AKS. Deploy and manage Spring Boot applications in containerized environments. Work … Ability to work independently and communicate effectively with both technical and non-technical stakeholders. Desirable experience: Terraform, Helm, GitOps, Azure DevOps, or GitHub Actions. Observability tools such as Prometheus, Grafana, OpenTelemetry, or Azure Monitor. Financial services or other regulated enterprise environments. ...

Staff Site Reliability Engineer

Hiring Organisation
Jobleads-UK
Location
Greater London, England, United Kingdom
find and analyze reliability problems across our stack, then design and implement software and systems to create step-function improvements. You will design robust observability solutions, lead incident response, automate operational tasks, and continuously improve our infrastructure's reliability, all while mentoring and educating the broader engineering team to make … reliability a core value at Replit. You Will Architect and Implement Observability: Design, build, and lead the implementation of comprehensive monitoring, logging, and tracing solutions. Create dashboards and metrics that provide real-time visibility into system health and performance, enabling proactive issue detection. Define and Drive Reliability Standards: Work with ...

Integration Developer FTC

Hiring Organisation
itecopeople
Location
London, United Kingdom
Employment Type
Permanent, Work From Home
Salary
£60,000
Build connectors, event-processing services, and data pipelines Design scalable integration patterns, schemas, and event flows Develop CDC pipelines and resilient messaging solutions Improve observability through logging, metrics, and tracing Deploy containerised services using Docker and Kubernetes Contribute to architecture, code reviews, and engineering standards Collaborate with developers, data engineers … design Agile development experience Strong communication and collaboration skills Desirable Skills Go and/or Python CDC pipeline development Azure cloud experience Observability tooling (Prometheus, Grafana, OpenTelemetry) Experience within regulated environments What's on Offer Hybrid working - 2 days per week in London Salary up to £60,900 Generous pension ...

Senior Azure Platform Engineer

Hiring Organisation
Rebel Recruitment
Location
Salford, Greater Manchester, North West, United Kingdom
Employment Type
Permanent, Work From Home
Salary
£90,000
about being part of that journey. You'll be working in an environment where tools like GitHub Copilot, OpenAI models, Claude, Gemini, AI-powered observability platforms, intelligent deployment workflows, and internal AI tooling are actively being explored and introduced to improve how engineering teams work day to day. This … designing and improving Azure infrastructure, evolving Kubernetes platforms within AKS, building reusable Infrastructure-as-Code patterns using Terraform and Crossplane, and helping improve reliability, observability, and security across the wider platform estate. You'll also spend time improving developer tooling and CI/CD processes, helping engineering teams deploy faster ...

Google cloud Platform Infrastructure Engineer

Hiring Organisation
Adroit People Limited (UK)
Location
City of London, London, United Kingdom
reduce manual effor t.Supporting incident response and learning good operational practice s.Participating in agile ceremonies and contributing to continuous improveme ntBuilding foundational knowledge in observability, security and DevOps cultur e. Essential skills & experie nce1–3 years’ experience in DevOps, SRE or cloud engineeri ng.Basic understanding of cloud concepts and core … ns.Exposure to IaC tools, preferably Terrafo rm.Enthusiasm, initiative and a desire to grow technical ly. Desirable sk illsAwareness of Kubernetes/containerisa tionUnderstanding of observability tooling (Prometheus, Dynatrace, et c.).Awareness of agile ways of work ing. ...

Senior DevOps Engineer

Hiring Organisation
Halian Technology Limited
Location
Reading, Berkshire, South East, United Kingdom
Employment Type
Permanent, Work From Home
Salary
£95,000
reliability, and availability Implement self-service tooling to empower development teams Drive DevOps best practices across the digital product lifecycle Develop and enhance monitoring, observability, and incident response processes Support global engineering teams delivering high-traffic platforms Key Requirements Proven experience supporting digital product delivery in a DevOps or platform … with Infrastructure as Code (Terraform, Ansible, Puppet or similar) Hands-on experience with Kubernetes, Docker, and cloud platforms (AWS preferred) Experience with monitoring/observability tools (Prometheus, Grafana, ELK, APM tools) Solid understanding of system performance, scalability, and resilience Strong collaboration and communication skills within cross-functional product teams Desirable ...

Principal Software Development Engineer

Hiring Organisation
Jobleads-UK
Location
Glasgow, Scotland, United Kingdom
Code, automation frameworks and database‐as‐code practices using tools such as Redgate Flyway. Take ownership of critical customer systems, ensuring operational resilience, observability, performance optimisation and rapid incident response. Collaborate closely with Product, Delivery, Operations and Commercial teams to shape technical solutions, delivery plans and strategic outcomes. Promote secure … Connect or Genesys Cloud. Proven ability to design and deliver secure, scalable and resilient cloud‐native solutions within complex enterprise environments. Strong understanding of observability, operational support, reliability engineering and end‐to‐end ownership practices. Knowledge of regulated financial services environments, including UK GDPR and FCA Consumer Duty requirements. Excellent ...

Senior Software Development Engineer in Test

Hiring Organisation
Jobleads-UK
Location
Greater London, England, United Kingdom
from traditional testing approaches towards a modern, engineering‐led quality strategy: unit testing, contract testing, component testing, integration testing, E2E flows, synthetics, and strong observability across our microservices. We’re looking for a Senior SDET who is hands‐on, highly technical, and passionate about setting teams up for long‐term … teams adopt best practices confidently. Collaborate with Engineering and DevOps to evolve CI/CD pipelines and embed automation earlier in the lifecycle. Improve observability around testing and reliability, integrating logs, traces, metrics, synthetics, and alerts to increase confidence in releases. Promote good testing principles and high‐quality engineering practices ...

Ai Engineer

Hiring Organisation
Morgan McKinley
Location
Yorkshire and Humberside, England, United Kingdom
Employment Type
Full-Time
Salary
Salary negotiable
with Generative and Agentic AI patterns, including LLM integration, RAG architectures, prompt-driven workflows, and AI service orchestration. Integrate AI capabilities with enterprise systems, observability tooling, and security frameworks. Design and maintain CI/CD pipelines within cloud-native engineering environments. Support benchmarking, evaluation, experimentation, and cost optimisation … Skills Experience with Kong API Gateway, Kong Mesh, and Flux CD. RESTful API and microservices development. Terraform and GitOps workflows. Exposure to prompt evaluation, observability, or AI red-teaming tools. SQL and NoSQL database experience. Understanding of vector search technologies and Retrieval-Augmented Generation (RAG) patterns. About You A proactive ...

Lead AI Engineer

Hiring Organisation
Morgan McKinley
Location
Yorkshire and Humberside, England, United Kingdom
Employment Type
Full-Time
Salary
Salary negotiable
with Generative and Agentic AI patterns, including LLM integration, RAG architectures, prompt-driven workflows, and AI service orchestration. Integrate AI capabilities with enterprise systems, observability tooling, and security frameworks. Design and maintain CI/CD pipelines within cloud-native engineering environments. Support benchmarking, evaluation, experimentation, and cost optimisation … Skills Experience with Kong API Gateway, Kong Mesh, and Flux CD. RESTful API and microservices development. Terraform and GitOps workflows. Exposure to prompt evaluation, observability, or AI red-teaming tools. SQL and NoSQL database experience. Understanding of vector search technologies and Retrieval-Augmented Generation (RAG) patterns. About You A proactive ...

GenAI Python Developer

Hiring Organisation
EMBS Technology
Location
London Area, United Kingdom
services with cloud-native architectures across Azure and AWS. Build and maintain CI/CD pipelines aligned with engineering and security standards. Implement observability, monitoring and performance optimisation across GenAI services. Support benchmarking, experimentation and evaluation of LLM performance, accuracy and cost. Collaborate with Architects, Platform Engineers, Product Teams … Retrieval-Augmented Generation (RAG) architectures. Vector databases and semantic search technologies. Kong API Gateway and Kong Mesh. FluxCD and GitOps workflows. Prompt evaluation and observability tools such as Promptfoo and Arize. SQL databases including PostgreSQL and MySQL. NoSQL database technologies. Distributed systems and cloud-native application development. What Success Looks ...

Senior Developer

Hiring Organisation
Addition
Location
Watford, Hertfordshire, England, United Kingdom
Employment Type
Full-Time
Salary
£80,000 per annum
Doing: Designing, deploying and managing automation and monitoring platforms that support large-scale applications and services Building and maintaining monitoring, alerting and observability tooling across the platform Creating dashboards that translate complex technical data into meaningful insights for stakeholders Developing automation to integrate new systems using existing frameworks Managing … Docker) Strong Python development skills , including scripting and Lambda functions Experience building and managing CI/CD pipelines , ideally with GitHub Actions Monitoring and observability tooling such as AppDynamics, Grafana, InfluxDB, Graphite, Sensu or similar Experience working with serverless architectures (Lambda, API Gateway, DynamoDB, EventBridge) Solid understanding of Linux/ ...

DevOps Engineer

Hiring Organisation
WTW
Location
Surrey, United Kingdom
Employment Type
Full Time
Managed Identities, Azure networking and Microsoft Entra ID. • Integrate and support security tooling and quality gates, including Mend, Snyk, Invicti, Wiz and GitLeaks. • Improve observability and feedback across build, deployment and environment health using tools such as Datadog, Azure Monitor and Log Analytics. • Help development teams diagnose delivery, deployment … troubleshooting. • Experience embedding security, quality and compliance checks into delivery pipelines, including vulnerability scanning, container scanning, secrets scanning and release evidence. • Good understanding of observability practices, including logs, metrics, dashboards, alerts and environment health checks. • Strong troubleshooting skills across pipelines, deployment automation, Kubernetes workloads, cloud configuration and environment issues. • Ability ...

Principal Machine Learning Engineer

Hiring Organisation
Jobleads-UK
Location
City Of London, England, United Kingdom
evolve the ML Platform, ensuring it supports: Reusable and scalable deployment patterns CI/CD for machine learning Full model lifecycle management Monitoring, observability, and alerting Secure and compliant operation Shape platform standards and interfaces that enable consistent ML delivery across squads and value streams Lead technical spikes and proof … fundamentals (OOP, testing, design patterns). Deep experience building, deploying, and operating production ML systems, including: online and batch model serving, monitoring, alerting, and observability, retraining and lifecycle management. Strong understanding of core data science concepts, sufficient to: review and challenge modelling approaches, ensure models are production‐ready and correctly ...

DevOps Engineer

Hiring Organisation
WTW
Location
Surrey, United Kingdom
Employment Type
Full Time
infrastructure. Automate environment provisioning across development and production. Manage backend state, pipelines, and state-change detection integrations. Platform Engineering & SRE Own and improve reliability, observability, and performance of the platform. Implement SLOs, alerting, dashboards, and auto remediation where possible. Troubleshoot cluster level, networking, and workload deployment issues. Lead root cause … endpoints, Certificate/Secret management etc Strong debugging and operational experience (SRE mindset). Solid experience of DevSecOps architecture, processes & tooling Solid understanding of Observability Process & Tooling Logging, metrics, traces, dashboards Other highly desirable, but not essential skills are: Experience with: GitOps - ArgoCD or GitOps workflows Zero downtime deployments (blue ...

Principal Software Development Engineer

Hiring Organisation
Jobleads-UK
Location
Manchester, England, United Kingdom
/CD pipelines, Infrastructure as Code, automation frameworks, and database-as-code practices using Redgate Flyway.Take ownership of critical customer systems, ensuring operational resilience, observability, performance optimisation, and rapid incident response.Collaborate closely with Product, Delivery, Operations, and Commercial teams to shape technical solutions, delivery plans, and strategic outcomes.Promote secure … Connect or Genesys Cloud.Proven ability to design and deliver secure, scalable, and resilient cloud-native solutions within complex enterprise environments.Strong understanding of observability, operational support, reliability engineering, and end-to-end ownership practices.Knowledge of regulated financial services environments, including UK GDPR and FCA Consumer Duty requirements.Excellent communication and stakeholder management ...