Observability Jobs

Employment Type

Remote Jobs

Hybrid/WFH 1,006

Sort By

Relevance
Date

Locations

Job Titles

Site Reliability Engineer

Hereford, Herefordshire, West Midlands, United Kingdom
Hybrid / WFH Options

Twinstream Limited

Work Scheme Key Responsibilities of the Site Reliability Engineer: Partner with developers to improve performance and reliability across systems Automate toil and reduce unnecessary alerts with smart tooling Evolve observability so we can prevent issues before they become incidents Improve CI/CD pipelines and support development teams in delivering quality faster Explore new technologies, tools, and services that improve … plus) Experience with Terraform and modern IaC practices Hands-on with Docker and orchestration tools (Kubernetes, OpenShift, or Docker Swarm) CI/CD experience (Jenkins or equivalent) Monitoring/observability tools: Grafana , Prometheus , or InfluxDB Event-driven messaging: RabbitMQ or similar Strong Linux skills, scripting, and understanding of network security protocols Experience with AWS: EC2, S3, RDS, Lambda Desirable: Experience … coding in Python, Java, or Go Exposure to cross-domain solutions Experience in a service management environment Observability best practices and metric-driven reliability improvement Security Requirements Due to the sensitive nature of our work, candidates must be eligible for Developed Vetting (DV) clearance. All offers are subject to security screening. Ready to Engineer Systems That Matter? If youre a More ❯

Employment Type: Permanent, Work From Home

Posted: 14 hours ago

Site Reliability Engineer

Chesterfield, England, United Kingdom
Hybrid / WFH Options

JR United Kingdom

automation and internal tools for deployment, monitoring, and incident response Tune performance across OS, network, and cloud layers — this role is hands-on and detail-oriented Improve system resilience, observability, and security in a high-stakes production environment Requirements: Fluent in Linux — not just using it, but understanding how it works under the hood Advanced terminal skills — manipulating systems efficiently … time environments Hands-on with Docker (Kubernetes is a plus), infrastructure-as-code, and CI/CD tooling Strong scripting and automation experience in Python and Bash Familiarity with observability stacks (Prometheus, OpenTelemetry, eBPF) Cloud infrastructure experience (AWS/GCP/Azure), with attention to IAM and software supply chain security Curious, persistent, and comfortable experimenting at the lowest levels More ❯

Posted: Today

Senior Site Reliability Engineer

Manchester, Lancashire, United Kingdom
Hybrid / WFH Options

Embarcaderomediagroup

ll sit at the heart of our engineering operations, bringing together SRE principles and modern platform engineering practices. This includes combining principles of SRE - such as service-level reliability, observability, incident response - with platform engineering practices like GitOps, Infrastructure as Code, DevSecOps automation, and self-service enablement, to help development teams ship faster, safer, and more cost-efficiently. What you … ll be doing: Designing and operating highly reliable, scalable, and secure Azure-based platforms Applying SRE principles like SLOs, observability, and incident management to drive service reliability Building Infrastructure as Code using Terraform (v1.7+) and GitOps workflows Enabling teams through platform tools, reusable Terraform modules, and self-service infrastructure Enhancing CI/CD pipelines (Azure DevOps, YAML-based) with security … knowledge (AKS, Functions, SQL, Cosmos DB, etc.) Strong Infrastructure as Code skills with Terraform (v1.7+) Experience with CI/CD pipelines, GitOps, and automation tools (PowerShell, Bash) Familiarity with observability and incident tools like Datadog, ELK, and synthetic monitoring Solid understanding of networking (TCP/IP, Load Balancing, DNS, Routing) Good knowledge of DevSecOps practices - including security scanning, IAM, and More ❯

Employment Type: Permanent

Salary: GBP Annual

Posted: 2 days ago

Site Reliability Engineer

Sheffield, England, United Kingdom
Hybrid / WFH Options

JR United Kingdom

Posted: Today

Senior Principal Platform Engineer with Security Clearance

Newport News, Virginia, United States

Clarity Innovations

workflows. Build infrastructure automation using tools like Terraform to ensure consistent provisioning of cloud and on-prem resources. Manage and evolve CI/CD systems, focusing on deployment standardization, observability, and integration with platform tools. Implement and maintain secrets management practices using tools like Vault across environments. Develop self-service tooling to enable development teams to manage application deployments and … configurations. Partner with teams to implement platform observability and system monitoring using tools such as ELK and Prometheus. Contribute to platform documentation, knowledge sharing, and developer onboarding for platform tooling. Required Qualifications: Bachelor's degree in Computer Science, Engineering, or related field, or equivalent practical experience. 10+ years of experience in platform engineering, DevOps, or infrastructure roles. Expert with Kubernetes … Experience with Vault for secure secrets management. Proficiency with scripting or programming languages (e.g., Python, Go, Bash). Experience with Terraform or other Infrastructure as Code tools. Familiarity with observability tools (e.g., ELK stack, Prometheus). Elastic experience is a plus. Strong collaboration and communication skills. Must have or be able to obtain SEC+ certification within three months of hire. More ❯

Employment Type: Permanent

Salary: USD Annual

Posted: 3 days ago

Platform Engineer

Crewe, Cheshire, United Kingdom
Hybrid / WFH Options

Manchester Digital

platform security, reliability, and performance across systems deployed in Canada, the UK, and AWS cloud environments Contribute to key projects, platform optimizations, and ongoing maintenance initiatives Help drive scalability, observability, and operational excellence If you're passionate about infrastructure, cloud, and systems engineering-and want to help shape the future of mobility-we want to hear from you! Requirements We … configurations (Azure AD , Ory, Cognito, Firebase) - Understanding of Site Reliability Engineering and key concepts - Proficient in Infrastructure as Code pipeline deployments and pipeline version control within Terraform or CloudFormation. - Observability Systems, e.g., Nagios, New Relic - Able to troubleshoot/work under pressure, meet deadlines. - Previous experience in a cloud engineering role. - AWS certified as SysOps Administrator/Solutions Architect/… understanding of Infrastructure as Code principles and related tech such as Terraform or CloudFormation - Enhanced experience of AWS cloud technologies, e.g., ECS, EC2, VPC, Lambda, CFS. Ideally AWS certified. - Observability Systems, e.g., New Relic, CloudWatch, SquadCast - ITIL Qualified or awareness of the framework. Bonus Qualifications: -Experience with Linux system administration and troubleshooting. -Basic knowledge of AWS cloud technologies such as More ❯

Employment Type: Permanent

Salary: GBP Annual

Posted: 19 days ago

Head of Technical Services UK&I

London, England, United Kingdom

NCR Corporation

and market demands. • Vendor Management & Cloud Governance: Engage with external vendors, drive cloud governance initiatives, and make critical build vs. buy decisions to support platform scalability and operational efficiency. • Observability & Automation: Develop and execute a comprehensive observability and automation strategy that aligns with business objectives and enhances platform reliability. • Financial Management: Implement best practices for financial operations and cost governance … build and image deployments. • Hands-on experience with classic hosting technologies (e.g. Kubernetes, AWS) • Familiarity with telephony technologies such as SIP, session border controllers, and related components. • Familiarity with observability tools such as Prometheus, Grafana, and Loki. • Strong Experience in Microsoft technology stack • Proficiency in tools such as GitLab, Docker, Terraform, CI/CD, and various deployment architectures. • Strong understanding More ❯

Posted: Yesterday

Senior DevOps Engineer - Search & Services - (f/m/x)

Berlin, Germany
Hybrid / WFH Options

AUTO1 Group SE

infrastructure and system issues, as well as log ingestion and communication issues. Design and develop scalable, robust, and high-performance data pipelines and data storage solutions. Develop and maintain observability frameworks using tools like Kibana, Grafana, or similar Work with cross-functional teams to define observability and search requirements. Scale, script and maintain our development and production platform foundation with More ❯

Employment Type: Permanent

Salary: EUR Annual

Posted: Yesterday

Lead Machine Learning Engineer (Agentic Infrastructure)

London, England, United Kingdom
Hybrid / WFH Options

ZipRecruiter

with the founding team to integrate models into internal and external user flows Write clean, production-ready code - often improving or refactoring existing prototypes Think holistically about agent lifecycle , observability, failure handling, and scalability Help define the tech stack and architecture for core components of the platform Contribute to novel research and publish at top conferences when opportunities arise What …/LLM libraries (e.g., Transformers, LangChain, LangGraph, OpenAI APIs) Experience with cloud platforms (AWS, GCP, or Azure), deployment, and CI/CD pipelines Familiarity with containerization (Docker, Kubernetes) and observability (e.g., Prometheus, Grafana) A builder mindset: you're comfortable with ambiguous specs, early-stage infrastructure, and iterating fast Excellent communication and self-management skills Nice To Have Familiarity with agentic More ❯

Posted: Today

Python Developer

Northern Ireland, United Kingdom
Hybrid / WFH Options

Ocho

cross-functional teams to design and deliver full-featured software components • Drive a “security-first” mindset across development practices, including OAuth2 and IAM policies • Lead operational efforts using modern observability frameworks to monitor and debug production systems • Mentor junior engineers and contribute to a culture of continuous improvement Essential Criteria: • Strong commercial experience in Golang and Python • Proven track record … secure application design principles • Hands-on experience designing and consuming RESTful and GraphQL APIs • Strong SQL skills and familiarity with data warehouses like Snowflake • Day-2 operations experience including observability, debugging, and triage Desirable Skills: • Experience with Auth0 , AWS Cognito , or similar identity platforms • Familiarity with Helm , Prometheus , Grafana , or OpenTelemetry • Exposure to other cloud platforms (GCP, Azure) • CI/ More ❯

Posted: 2 days ago

Head of Technical Services UK&I

London, England, United Kingdom

NCR Voyix

and market demands. Vendor Management & Cloud Governance: Engage with external vendors, drive cloud governance initiatives, and make critical build vs. buy decisions to support platform scalability and operational efficiency. Observability & Automation: Develop and execute a comprehensive observability and automation strategy that aligns with business objectives and enhances platform reliability. Financial Management: Implement best practices for financial operations and cost governance … build and image deployments. Hands-on experience with classic hosting technologies (e.g. Kubernetes, AWS) Familiarity with telephony technologies such as SIP, session border controllers, and related components. Familiarity with observability tools such as Prometheus, Grafana, and Loki. Strong Experience in Microsoft technology stack Proficiency in tools such as GitLab, Docker, Terraform, CI/CD, and various deployment architectures. Strong understanding More ❯

Posted: Yesterday

Engineering Acceleration Engineer

Wantage, England, United Kingdom

Motorsport Network

and other people in the business who develop software Offer best-practice recommendations on IDEs and developer tooling, build systems, package management and CI/CD systems, monitoring and observability Implement and maintain standard templates, automations and infrastructure that support the development process at Atlassian Williams Racing Adopt or create shared libraries/components that benefit multiple Software Engineering teams … testing in languages such as C#, Go, Java, C++, Python, Typescript Containerization, DevOps, and Cloud Platforms such as Azure or AWS K8s provisioning, configuration and operation Logging, monitoring, and observability tooling CI/CD best practices, Release Engineering Git best practices Cloud-native migration or adoption projects Building developer-facing platforms and tooling Strong desire to build impactful solutions for More ❯

Posted: 2 days ago

DataOps Engineer

London, England, United Kingdom
Hybrid / WFH Options

55 Redefined Ltd

using Docker and deploying them to Container Platforms (EKS, AKS and Kubernetes). Implementing and managing CI/CD pipelines for data applications. Implementing and managing comprehensive monitoring and observability solutions using tools like Grafana, Prometheus, and other non-native monitoring tools, ensuring data quality across the entire data flow. Working with Infrastructure as Code (IaC) tools (e.g., Terraform, Ansible … Familiarity with open-source data tools (e.g., Spark, Kafka, PostgreSQL). Competency understanding of IaC concepts (e.g., Terraform, Ansible). Understanding of data architecture principles. Experience with monitoring and observability tools like Grafana and Prometheus. Your Security Clearance To be successfully appointed to this role, it is a requirement to obtain Security Check (SC) clearance. To obtain SC clearance, the More ❯

Posted: Today

Senior Software Engineer - Cloud Infrastructure

United Kingdom
Hybrid / WFH Options

Primer

strategy, execution, tooling and best practices Collaborate with multiple product teams and respective owners to design infrastructure as we scale Building custom metrics and features to enhance Primer's observability Infrastructure as code (IaC) development Writing processes and documentation for system design, troubleshooting and maintenance What are we looking for? Strong experience with a cloud provider (AWS preferred but we … Kubernetes clusters Knowledge of security best practices and the ability to implement security controls at the infrastructure level Experience with monitoring and logging tools like DataDog or Grafana's observability stack (Prometheus, Tempo, Loki, Grafana) Familiarity with the open standard OpenTelemetry Excellent written and verbal communication skills, we're a collaborative team! PLEASE NOTE: Our engineering teams work fully remotely More ❯

Employment Type: Permanent

Salary: GBP Annual

Posted: 2 days ago

Site Reliability Engineer (Senior)

London, England, United Kingdom

Curve

Automate and accelerate - reduce manual tasks and allow all of Curve's engineers to concentrate on building exciting new features. Build, measure, learn - help us put together the best observability tools to continually improve Curve's services and performance. Our current tech stack includes: Cloud providers: Amazon Web Services and Google Cloud Platform. Platforms: Kubernetes (EKS) with Istio service mesh. … Observability: Prometheus, Coralogix and Grafana. Databases: PostgreSQL, MongoDB and Hashicorp Vault. Infrastructure as Code: extensive use of Terraform and Atlantis. CI/CD: GitLab, Flux and Helm. Skills & Experience: Experience deploying production ready applications to a Kubernetes cluster. 2+ years of experience with a cloud provider (AWS an advantage). Knowledge of defining infrastructure as code, ideally with Terraform. Experienced More ❯

Posted: Today

Principal DevOps Engineer AWS

London, South East, England, United Kingdom

McGregor Boyall

/Core), JavaScript (Node.js), Ruby, C++ Developer tooling: Full stack CI/CD, GitLab, Jenkins, Sonatype Nexus Experience containerising application components (Dockerfiles, Kubernetes) Deep understanding of pipelines as code Observability concepts and tooling: Opensearch, Cribl, Grafana, Prometheus, CloudWatch If this is of if interest and you have the required skills, please submit your CV over for immediate consideration. McGregor Boyall More ❯

Employment Type: Full-Time

Salary: £100,000 - £120,000 per annum

Posted: 21 days ago

Principal Consulting Architect - Search

United Kingdom
Hybrid / WFH Options

Elasticsearch B.V

results that matter. By taking advantage of all structured and unstructured data - securing and protecting private information more effectively - Elastic's complete, cloud-based solutions for search, security, and observability help organizations deliver on the promise of AI. What Is The Role: You will have the opportunity to work with a tremendous services, engineering, product, and sales team and wear … consultant will be focused on excellence, taking the initiative for self-improvement and possess great communication skills. Our customers' use cases extend across all the Elastic Solutions: Enterprise Search, Observability and Security, and beyond, and the scale of data in their environments ranges from gigabytes to petabytes. This diverse mix of a customer base means the challenges they face that More ❯

Employment Type: Permanent

Salary: GBP Annual

Posted: 5 days ago

Principal Consulting Architect - Search

London, England, United Kingdom
Hybrid / WFH Options

Elasticsearch B.V

results that matter. By taking advantage of all structured and unstructured data — securing and protecting private information more effectively — Elastic’s complete, cloud-based solutions for search, security, and observability help organizations deliver on the promise of AI. What Is The Role: You will have the opportunity to work with a tremendous services, engineering, product, and sales team and wear … consultant will be focused on excellence, taking the initiative for self-improvement and possess great communication skills. Our customers’ use cases extend across all the Elastic Solutions: Enterprise Search, Observability and Security, and beyond, and the scale of data in their environments ranges from gigabytes to petabytes. This diverse mix of a customer base means the challenges they face that More ❯

Posted: Today

Platform Specialist - Databases

London, England, United Kingdom

Squarepoint Capital

or related experience. Nice to have: Experience with other database platforms ClickHouse, FoundationDb, MSSQL, MongoDb, Redis, Neo4j. Experience with configuration management tools i.e. Chef, Terraform, Ansible. Experience with an observability & monitoring stack such as Prometheus exporters, LogStash, Elasticsearch, Prometheus, Thanos, Grafana, and AlertManager. Experience with CI/CD pipelines. Experience working with various cloud providers (AWS and GCP). Experience More ❯

Posted: 4 days ago

Solace Messaging Administrator

London Area, United Kingdom

H&P Executive Search

on Solace PubSub+, ensuring high availability, optimal performance, and reliability across production and non-production environments. You will be working on incident response, capacity planning, WAN optimization, and system observability so should have experience with tools such as Prometheus and Grafana . Key Responsibilities: Administer and maintain Solace PubSub+ appliances and software brokers Provide production support for messaging-related incidents More ❯

Posted: 3 days ago

Solace Messaging Administrator

City of London, London, United Kingdom

H&P Executive Search

Posted: 3 days ago

Loan IQ DevOps Engineer

Manchester Area, United Kingdom
Hybrid / WFH Options

Revolent Group

related processes like data migrations and environment setup. ✅ Preferred (Nice to Have): Banking/Financial Services knowledge — especially around wholesale lending and Loan IQ . Experience with monitoring and observability tools such as APPD, ELK Stack, or Grafana. Understanding of DevSecOps principles , including vulnerability scanning, secrets management, and compliance automation. Further experience with CI/CD integration and pipeline automation More ❯

Posted: Today

Senior IaC Software Engineer

Edinburgh, Scotland, United Kingdom
Hybrid / WFH Options

JR United Kingdom

native Infrastructure-as-Code (IaC) solutions from the ground up? Our client is seeking a talented and motivated Senior Software Engineer to lead the development of our next-generation observability platform. THIS IS NOT A DEVOPS ROLE. Responsibilities Collaborate within a dynamic software engineering team to architect and build a new cloud-native IaC platform. Develop software using technologies such More ❯

Posted: Today