901 to 925 of 1,213 Permanent Observability Jobs

Head of Infrastructure

Hiring Organisation: Jobleads-UK
Location: Greater London, England, United Kingdom

cloud architecture, operational resilience, developer experience and infrastructure team leadership. You will be responsible for shaping the long term infrastructure roadmap, improving reliability and observability, strengthening incident response and ensuring the platform can support a growing customer base and increasingly critical product suite. This is a role for someone … platform strategy Design and evolve the AWS cloud architecture to support scale, resilience and performance Set standards across infrastructure, CI/CD, environments and observability Lead production reliability, uptime, incident response and post incident reviews Improve monitoring, alerting and on call practices to ensure they are effective and sustainable Partner ...

DevOps Technical Lead

Hiring Organisation: Data Careers
Location: South East London, London, United Kingdom
Employment Type: Permanent, Work From Home

optimise CI/CD pipelines Improve deployment reliability and reduce rollback frequency Standardise release processes across engineering teams Implement progressive delivery practices Reliability & Observability Define and track SLIs/SLOs Enhance monitoring, alerting and incident response processes Lead post-incident reviews and root cause analysis Drive reduction of operational toil … Lambda) Proven Infrastructure-as-Code experience (Terraform preferred) CI/CD tooling experience (GitHub Actions, GitLab CI, Jenkins) Experience operating production SaaS environments Strong observability tooling knowledge (Datadog, Prometheus, ELK etc.) Incident management and root cause analysis experience Experience in regulated or security-conscious environments is highly desirable ...

Platform Storage Engineer

Hiring Organisation: Ncounter
Location: East London, London, England, United Kingdom
Employment Type: Full-Time
Salary: £160,000 - £190,000 per annum

vendor storage tooling into a unified platform • Improve storage throughput, data locality and platform efficiency for research workloads • Collaborate closely with compute, networking and observability teams across the wider platform estate • Support troubleshooting, tuning and reliability engineering for production storage systems What we’re looking for: • Strong backend or systems … Rust, C++ or Java • Experience building or supporting distributed systems at scale • Strong Linux knowledge and an interest in infrastructure engineering • Exposure to observability tooling such as Prometheus, Grafana, Datadog or ELK • Understanding of cloud and infrastructure automation, ideally AWS, GCP or Terraform • Any experience with Ceph, MinIO, JuiceFS, FUSE ...

Site Reliability Engineering Manager

Hiring Organisation: F5 Consultants
Location: Reading, England, United Kingdom

engineering standards, operational maturity, and long-term platform stability. You’ll work within a modern cloud-native environment leveraging Kubernetes, OpenShift, GitOps, service mesh, observability tooling, and automation-first engineering practices. This is a highly influential role where you’ll lead and mentor high-performing SRE teams while remaining technically … uptime genuinely matter. Skills Required Strong expertise in Kubernetes and OpenShift (non-negotiable) Experience with multi-cloud and hybrid architectures Hands-on experience with observability platforms Strong Infrastructure as Code and GitOps experience Proven experience with CI/CD automation and reliability-focused engineering Demonstrated ability to lead and mentor ...

SRE Technical Lead

Hiring Organisation: F5 consultants
Location: Berkshire, South East, United Kingdom
Employment Type: Permanent, Work From Home

engineering standards, operational maturity, and long-term platform stability. You'll work within a modern cloud-native environment leveraging Kubernetes, OpenShift, GitOps, service mesh, observability tooling, and automation-first engineering practices. This is a highly influential role where you'll lead and mentor high-performing SRE teams while remaining technically … time genuinely matter. Skills Required Strong expertise in Kubernetes and OpenShift (non-negotiable) Experience with multi-cloud and hybrid architectures Hands-on experience with observability platforms Strong Infrastructure as Code and GitOps experience Proven experience with CI/CD automation and reliability-focused engineering Demonstrated ability to lead and mentor ...

Software Engineer (Monitoring Platform)

Hiring Organisation: SRT Marine Systems PLC
Location: Bristol, United Kingdom
Employment Type: Permanent
Salary: £55000 - £75000/annum

Engineer (Monitoring Platform) here at SRT, you will be part of a small team responsible for designing, building, and maintaining our productised monitoring and observability platform. This platform is deployed across geographically distributed on-premises sites worldwide, serving clients with varying infrastructure and WAN capabilities. Rather than simply using Prometheus … monitoring platform consistent, maintainable, and scalable across dozens of deployments. You as a Software Engineer (Monitoring Platform) will work closely with a lead observability engineer who oversees the platform's architecture, and you will have the authority to architect monitoring solutions and specify changes to be implemented by other development ...

Software Engineer (Monitoring Platform)

Hiring Organisation: SRT Marine Systems PLC
Location: Birmingham, West Midlands (County), United Kingdom
Employment Type: Permanent
Salary: £55000 - £75000/annum

Cloud Operations Engineer

Hiring Organisation: Anson Mccade
Location: Cheltenham, Gloucestershire, South West, United Kingdom
Employment Type: Permanent

strong hands-on experience required) Kubernetes (deployment, troubleshooting, and platform support) Infrastructure as Code (Terraform or similar tools) Cloud-native networking and system troubleshooting Observability and monitoring tools APIs and integration services Secure, restricted, air-gapped cloud environments Required Experience Strong experience working with Linux-based systems in production environments … operate within highly secure cloud architectures Desirable Experience Kubernetes administration or advanced troubleshooting experience Infrastructure as Code experience (Terraform or similar) Exposure to observability and monitoring platforms Experience working in 24/7 operational environments Prior experience coordinating shifts or leading small technical teams deep expertise in secure cloud operations ...

Storage DevOps Engineer (Weka/Ceph) - Up to £180k + Industry Leading Bonus

Hiring Organisation: Hunter Bond
Location: City of London, London, United Kingdom

including NFS, GPFS, WEKA, object storage, and cloud-native solutions Optimise storage environments for performance-critical compute and research workloads Drive automation, monitoring, and observability improvements across storage services Perform capacity planning, forecasting, and lifecycle management at scale Troubleshoot complex storage and performance issues across distributed systems Work closely with … Experience with infrastructure automation tools such as Ansible or Chef Knowledge of cloud storage platforms across AWS and/or GCP Strong understanding of observability and monitoring within distributed systems Familiarity with modern software engineering practices including CI/CD and version control Understanding of storage hardware, NVMe/ ...

Backend Engineering Team Lead

Hiring Organisation: Jobleads-UK
Location: Bristol, England, United Kingdom

software lifecycle from design ideation through to production and eventual decommissioning. Our engineering teams work under a true DevOps culture - with infrastructure as code, observability, automated testing, and continuous delivery treated as first-order concerns, not afterthoughts. You’ll set architectural direction, partner closely with your Product Manager counterpart … Technical Environment Languages & frameworks: C#/.NET Cloud: Azure Architecture: Event-driven systems and microservice development (Service Bus) Engineering culture: DevOps, infrastructure as code, observability and monitoring, automated testing across all environments including production, continuous delivery Our Engineering Approach Full ownership: Teams own their solutions end-to-end - from inception ...

Devops Engineer (AWS)

Hiring Organisation: Nigel Wright Group
Location: Newcastle Upon Tyne, Tyne and Wear, England, United Kingdom
Employment Type: Full-Time
Salary: £45,000 - £65,000 per annum

improving Infrastructure as Code Supporting CI/CD pipelines, deployment processes, and automation Contributing to security practices, controls, and compliance activities Improving monitoring, observability, and incident response Troubleshooting issues with a focus on long-term solutions Collaborating with engineering teams to ensure smooth, efficient delivery Supporting improvements that reduce risk … Knowledge of CI/CD pipelines and modern deployment practices Strong problem-solving skills and the ability to work in evolving environments Desirable experience: Observability and monitoring tools (e.g. Datadog) Networking, DNS, or cloud networking knowledge Exposure to SRE practices or reliability engineering Experience with migrations, scaling environments, or infrastructure ...

Site Reliability Engineer

Hiring Organisation: Wave Talent
Location: Greater London, England, United Kingdom

next hires are walking into genuine distributed systems problems — not a greenfield rebuild or a dashboard feature. What you'll be working on Owning observability across the platform OpenTelemetry, metrics, logs, traces, and making them genuinely useful at 3am Designing and operating distributed systems primitives under real production load — queues … first principle across a cloud-native footprint Running on-call practice: SLOs, runbooks, blameless postmortems, paging hygiene What they're looking for Strong observability background production experience with OpenTelemetry, Prometheus or equivalent Distributed systems experience you've designed or operated systems with non-trivial failure modes Strong with in TypeScript ...

Platform and Cloud Engineer (DevOps & Azure)

Hiring Organisation: iQ HealthTech
Location: England, United Kingdom

suits someone with strong Azure infrastructure, Kubernetes, CI/CD, and DevOps capability, combined with a practical, service-oriented mindset. You will improve reliability, observability, deployment maturity, security posture, and infrastructure consistency, while working closely with developers to reduce operational friction and support more frequent, lower-risk releases. … infrastructure Own and improve our Azure environment architecture, configuration, maintenance, and operational health. Support and evolve our cloud platform to improve stability, resilience, observability, security, and cost control. Manage and support Kubernetes-based workloads, including deployments, configuration, troubleshooting, scaling, and operational reliability. Maintain and improve core infrastructure services, environments ...

Senior Vice President, Full-Stack Engineer

Hiring Organisation: BNY
Location: Manchester, North West, United Kingdom
Employment Type: Permanent

underlying workflow engine (e.g., Camunda) to enable extensibility, portability, and enterprise-scale orchestration. Drive delivery excellence across workflow and decisioning platforms, embedding observability, resilience, auditability, and performance at scale, while establishing engineering standards across CI/CD, testing, security, and data architecture. To be successful in this role, were seeking … platform design and optimisation. Proven track record of delivering production-grade platforms, embedding engineering excellence across test automation, CI/CD, observability, resilience, and traceability while driving continuous improvement of SDLC practices at scale. Hands-on technical leader who can actively contribute to solution design and critical builds, while defining ...

Technical Lead

Hiring Organisation: VIKASO® | Robotics 4.0
Location: Buckinghamshire, UK

releases, measurable team velocity, and a culture of ownership. Responsibilities Architecture & Direction Define and document the reference architecture for the platforms (APIs, services, data, observability, security). Own cross-cutting concerns: Auth, API standards, schema evolution. Lead technical design reviews; make clear, pragmatic decisions. Delivery Leadership Translate product requirements into … Qualifications 7+ years in software engineering; 2+ years in a Tech Lead/Staff role. Proven experience architecting and shipping cloud services (APIs, data, observability). Strong backend skills ( C++ and Python ; PostgreSQL). Working knowledge of frontend engineering (React or Vue) sufficient to mentor and set standards. DevOps literacy ...

Software Technical Lead

Hiring Organisation: VIKASO® | Robotics 4.0
Location: Buckinghamshire, England, United Kingdom

Software Developer

Hiring Organisation: Transunion
Location: Alderley Edge, Cheshire, United Kingdom
Employment Type: Permanent

build reliable backend systems and infrastructure tooling Use TDD to write high-quality, maintainable code and build out automated test suites Own reliability, observability, and performance of key services Collaborate with clients to understand requirements, debug issues, and propose solutions Drive improvements to system architecture, automation, and deployment processes Mentor … Desirable Skills & Experience: Experience owning backend systems in production environments Experience with Cloud Platforms AWS or GCP Infrastructure-as-code, CI/CD, and observability tooling Experience scaling systems under sustained load Contributions to internal tooling or open source Experience with large datasets and machine learning models Impact ...

Cloud Security and Platform Engineer

Hiring Organisation: RealityMine
Location: Trafford Park, Greater Manchester, UK

mainly focused on AWS, with growing involvement in other cloud and SaaS platforms. You’ll improve existing environments—managing identity and access, governance, security, observability, and lifecycle—by reducing risks, eliminating unsafe configurations, validating ownership, and ensuring the cloud estate is clearly governed and auditable. You will take an active … role in improving RealityMine’s security posture by improving and operating security scanning, improving monitoring and observability, and ensuring risks, vulnerabilities, and end of life components are identified and addressed in a timely and pragmatic way. You will also develop automation used to support security and operational hygiene, reducing manual ...

Cloud Security and Platform Engineer

Hiring Organisation: RealityMine
Location: Trafford Park, England, United Kingdom

Partner Integration Engineer

Hiring Organisation: Jobleads-UK
Location: Greater London, England, United Kingdom

need and resolving technical issues as they arise. You will split your time between building partner-facing product features, developing integrations between systems, improving observability and reliability, and resolving complex production issues when they arise. What You'll Do Build software at the interface of OEM and Wayve systems APIs … Frontend development experience. Experience working with automotive or mobility industry partners. Background in SRE, DevOps or cloud operations. Experience building or operating monitoring or observability tooling. This is a full‐time role based in our office in London. At Wayve we want the best of all worlds so we operate ...

Enterprise Head of AI Engineering (Founding)Sales Development Representative (SDR)

Hiring Organisation: Pyxos
Location: City of London, London, United Kingdom

position You will own the technical direction of our agent surface, the proprietary build environment behind it (our Agentic Studio), and the evaluation, observability, and safety layers that make the system trustworthy enough for regulated enterprise deployment. We build with AI: agentic development tooling is core to how Pyxos ships … then-execute patterns, output validation, tool-use restrictions, policy enforcement. • Production engineering rigor. Strong Python; cloud fluency (AWS, GCP, or Azure); CI/CD, observability, cost attribution. • Engineering leadership at startup pace. You have hired, managed, and grown teams — not just been an individual contributor. Nice to have: regulated-industry ...

Software Engineer, Frontend

Hiring Organisation: Side
Location: San Francisco, California, United States
Employment Type: Permanent
Salary: USD Annual

debug complex frontend bugs and UI regressions, and support operational excellence. Identify technical debt and suggest pragmatic improvements. Follow engineering best practices for security, observability, data integrity, and application performance. Leverage AI-assisted engineering tools to improve productivity across prototyping, debugging, task management, documentation, and code review, while applying strong … including AI-assisted development tools. Nice to Have Experience with design systems or component library work (e.g., MUI, Radix, or similar). Experience with observability and production monitoring tools. Familiarity with cloud platforms and modern CI/CD practices. Experience contributing to open-source projects. Familiarity with building internal tools ...

Senior AI-Native Software Engineer

Hiring Organisation: Vaco LLC
Location: Tempe, Arizona, United States
Employment Type: Permanent
Salary: USD Annual

harden AI-generated code for production use Contribute to scalable engineering patterns, integration standards, and platform architecture decisions Build with strong attention to testing, observability, maintainability, and security Required Skills & Experience 4+ years of professional software engineering experience Strong background in application, platform, or full stack engineering Experience building production … experience using AI-assisted development workflows in real-world software delivery Strong systems thinking and product mindset Experience with CI/CD, automated testing, observability, and secure engineering practices Preferred Experience Agentic AI frameworks such as LangGraph, LangChain, Semantic Kernel, AutoGen, CrewAI, or similar MCP, tool-calling, or function-calling ...

Data Reliability Engineer: Build Trusted Data & Observability

Hiring Organisation: Jobleads-UK
Location: England, United Kingdom

Data Reliability Engineer for its growing data team in London. This hybrid role emphasizes improving data quality and reducing incidents while building scalable observability across data platforms. Strong SQL and Python skills are essential, along with experience in data observability tools and cloud-based environments. Proactive problem-solving skills ...

Observability Engineer- Dynatrace

Hiring Organisation: eTeam
Location: Telford, England, United Kingdom

Recruitment specialist that provides support to the clients across EMEA, APAC, US and Canada. We have an excellent job opportunity for you. Job Title: Observability Engineer- Dynatrace Duration: 6 months Location: Telford - 2 days min per month Rate:581GBP/Day(Inside IR35) Role Description: As an Observability Engineer … insight, and proactive incident management. Key Responsibilities: Translate high-level monitoring and non-functional requirements (NFRs) into actionable configurations in Dynatrace. Deliver full-stack observability solutions, including application-aware network performance monitoring (NPM), synthetics, log analytics, and infrastructure metrics. Collaborate with architects and project teams to integrate monitoring into solution ...