Remote Observability Job Vacancies

26 to 50 of 1,080 Remote Observability Jobs

Software Engineer

England, United Kingdom
Hybrid / WFH Options
Circadia Health
Lead technical discovery with prospects and customers, translating clinical and operational requirements into secure, scalable infrastructure designs. Build and maintain Kubernetes clusters, Terraform IaC, CI/CD pipelines , and observability tooling (Prometheus, Grafana). Optimise real‐time data pipelines using Apache Kafka, Snowflake, and Postgres —ensuring low‐latency, high‐reliability ingestion from IoT sensors and EHR integrations. Collaborate with our … DSPT, GDPR . Experience orchestrating GPU/AI workloads , MLops, or large‐language‐model serving. Knowledge of edge/IoT deployments and over‐the‐air update strategies. Exposure to observability stacks (OpenTelemetry, Loki) and security tooling (Falco, Aqua, Wiz). What We Offer Base salary £100,000 – £170,000 plus meaningful equity. Gym membership Comprehensive health, dental & vision coverage (UK More ❯
Posted:

Senior Software Engineer (Viator)

London, England, United Kingdom
Hybrid / WFH Options
Tripadvisor
offices. What will you do As part of the SRE team you will be participating in design and implementing parts of our engineering platform that enables scaling, metrics and observability, ensures and improves reliability. Identify gaps in our engineering platform that improves availability, latency, performance, efficiency, change management, monitoring, emergency response Guide and mentor other people on the team and … partitioning, etc ) and architectural level (denormalisation, CQRS-ES, Federation, etc ) Experience building and working with and monitoring microservice architectures in large distributed cloud environments (ideally AWS). Experience with Observability tooling – having proficiency using tools like Elasticsearch, Kibana, APM, Sentry, Grafana, Prometheus, Overops, or similar The ability to guide and mentor other members within the team and improve the way More ❯
Posted:

Site Reliability Engineer

London Area, United Kingdom
Hybrid / WFH Options
Explore Group
financial institutions. What You'll Do Maintain and improve our AWS-based infrastructure using Infrastructure-as-Code (Terraform) Support and scale Kubernetes clusters hosting critical microservices Design and enhance observability, alerting, and incident response processes Collaborate closely with engineers to ensure systems are reliable, secure, and performant Lead root cause analysis for production incidents and help prevent recurrence Build tooling More ❯
Posted:

Site Reliability Engineer

City of London, London, United Kingdom
Hybrid / WFH Options
Explore Group
financial institutions. What You'll Do Maintain and improve our AWS-based infrastructure using Infrastructure-as-Code (Terraform) Support and scale Kubernetes clusters hosting critical microservices Design and enhance observability, alerting, and incident response processes Collaborate closely with engineers to ensure systems are reliable, secure, and performant Lead root cause analysis for production incidents and help prevent recurrence Build tooling More ❯
Posted:

Lead DevOps Engineer

Leeds, West Yorkshire, Yorkshire, United Kingdom
Hybrid / WFH Options
Fruition Group
pipelines Drive platform modernisation Manage a small team of engineers Align DevOps capabilities with the wider business Champion DevEx, reliability, and security Embed operational excellence and incident response Promote observability and performance optimisation Lead DevOps Engineer Requirements Proven line management experience Cloud-native expertise (any cloud provider is fine: GCP, AWS or Azure) Knowledge of GitLab CI/CD, Terraform More ❯
Employment Type: Permanent, Work From Home
Posted:

Senior Site Reliability Engineer - FinTech / Global Payments - London HQ / Remote First

London, England, United Kingdom
Hybrid / WFH Options
ZipRecruiter
resilient, observable, and maintainable features. Promote DevOps culture by leading knowledge-sharing sessions and supporting issue resolution. Skills Strong grounding in SRE principles and operational best practices. Proficient with observability tools (Prometheus, Grafana, OTEL, Cloudwatch) and telemetry pipelines. Solid programming skills in Python and/or Go; Java experience a plus. Hands-on AWS expertise (EKS, SQS, RDS, EC2, Lambda More ❯
Posted:

Lead DevOps Engineers -SC Security Clearance

England, United Kingdom
Hybrid / WFH Options
InterQuest Solutions
Go Significant experience with AWS cloud infrastructure Deep understanding of IaC tools: Terraform, Packer, CloudFormation Proven leadership in multidisciplinary delivery teams Skills in Databases: MongoDB/Atlas; Messaging: Kafka; Observability: Prometheus, Grafana, Splunk Experience working in a DevOps environment with a focus on CI/CD pipelines Experience designing, implementing, securing, and supporting Unix/Linux platforms (preferably RHEL/ More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Senior Azure Site Reliability Architect

London, England, United Kingdom
Hybrid / WFH Options
Nordcloud group
languages such as C#, Python, Perl, Java, C++. Experience with CI/CD tools like Azure DevOps, GitHub Actions, GitLab, Jenkins, TeamCity. Scripting skills in PowerShell, Bash. Familiarity with observability and monitoring tools such as Prometheus, Grafana, Splunk. Experience with containerization tools like Docker, Kubernetes, OpenShift, EC2 containers. Analytical and creative problem-solving skills. We encourage you to apply, even More ❯
Posted:

Senior Azure Site Reliability Architect

London, England, United Kingdom
Hybrid / WFH Options
Nordcloud
Patterns for Development Programming languages, such as C#, Python, Perl, Java, C++ CICD tools such as Azure DevOps, GitHub Actions, Gitlab, Jenkins, TeamCity Scripting languages such as PowerShell, bash Observability/Monitoring: Prometheus, Grafana, Splunk Containerisation tools such as Docker, K8S, OpenShift, EC, containers Analytical and creative approach to problem solving We encourage you to apply , even if you don More ❯
Posted:

Senior DevOps Engineer/SRE - Full-time

London, England, United Kingdom
Hybrid / WFH Options
Parity Technologies
Excellence : Contribute to Parity’s blockchain node operations, improving the reliability of the Polkadot network by managing test and benchmark networks in the cloud and on-prem. Enhance our observability initiatives by operating mainnet nodes for the Polkadot and Kusama Relaychain and System parachains, gathering crucial operational data for monitoring and incident response. Infrastructure Solutions : Conceptualize and build innovative infrastructure More ❯
Posted:

Senior Azure Site Reliability Architect

London, England, United Kingdom
Hybrid / WFH Options
Nordcloud, an IBM Company
Patterns for Development Programming languages, such as C#, Python, Perl, Java, C++ CICD tools such as Azure DevOps, GitHub Actions, Gitlab, Jenkins, TeamCity Scripting languages such as PowerShell, bash Observability/Monitoring: Prometheus, Grafana, Splunk Containerisation tools such as Docker, K8S, OpenShift, EC, containers Analytical and creative approach to problem solving We encourage you to apply , even if you don More ❯
Posted:

Cloud Native DevOps Engineer (AWS)

London, England, United Kingdom
Hybrid / WFH Options
Capgemini
using tools such as Terraform or CloudFormation. • Implement CI/CD pipelines, enabling continuous integration and continuous deployment for mission-critical applications. • Monitor system performance, availability, and security, implementing observability best practices. • Work in an Agile environment, engaging with stakeholders to understand requirements and deliver iterative improvements. You can bring your whole self to work. At Capgemini building an inclusive More ❯
Posted:

Network Security Engineer - London

London, United Kingdom
Hybrid / WFH Options
Analyticsengineering
IT workflows. Your responsibilities will also include developing CI/CD pipelines tailored for IT infrastructure, enhancing deployment efficiency, and integrating robust network security measures. You will establish comprehensive observability and proactive issue resolution strategies. We are seeking individuals passionate about network automation, security, and scalable IT solutions that enhance both campus and cloud network operations. You should possess extensive More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Senior AWS Engineer

London
Hybrid / WFH Options
BAE Systems
or DevOps Expertise in microservices and API design Docker, and container runtime platforms such as Kubernetes, EKS, ECS etc Strong understand of operational concepts on AWS, particularly monitoring and observability, FinOps Utilising CI/CD tools, such as Bamboo, Jenkins, TeamCity, Bitbucket, in order to streamline delivery of new features and fixes Continual testing of code using Automated Testing Frameworks More ❯
Employment Type: Permanent
Posted:

Senior AWS Engineer

Manchester, North West
Hybrid / WFH Options
BAE Systems
or DevOps Expertise in microservices and API design Docker, and container runtime platforms such as Kubernetes, EKS, ECS etc Strong understand of operational concepts on AWS, particularly monitoring and observability, FinOps Utilising CI/CD tools, such as Bamboo, Jenkins, TeamCity, Bitbucket, in order to streamline delivery of new features and fixes Continual testing of code using Automated Testing Frameworks More ❯
Employment Type: Permanent
Posted:

Senior Site Reliability Engineer

London, United Kingdom
Hybrid / WFH Options
Stott and May
Tuesdays, Thursdays WFH) Pay: negotiable, inside IR35 We're looking for an experienced DevOps Engineer to join our team on a contract basis, with a focus on AWS infrastructure, observability tooling, and CI/CD automation. This is a hands-on role supporting high-availability systems, rapid deployments, and production incident response. Key Responsibilities - Manage and monitor AWS infrastructure for … performance and security - Respond to production incidents, perform root cause analysis, and implement fixes - Maintain observability tools (Prometheus, Grafana, Splunk) and write PromQL queries - Improve and operate CI/CD pipelines using GitHub Actions and Kubernetes - Automate infrastructure tasks with Python, Bash, Go or SQL - Work with Git-based workflows for infrastructure as code - Troubleshoot Kubernetes workloads and containerised services More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Senior Site Reliability Engineer

London, England, United Kingdom
Hybrid / WFH Options
Stott and May
Tuesdays, Thursdays WFH) Pay: negotiable, inside IR35 We're looking for an experienced DevOps Engineer to join our team on a contract basis, with a focus on AWS infrastructure, observability tooling, and CI/CD automation. This is a hands-on role supporting high-availability systems, rapid deployments, and production incident response. Key Responsibilities Manage and monitor AWS infrastructure for … performance and security Respond to production incidents, perform root cause analysis, and implement fixes Maintain observability tools (Prometheus, Grafana, Splunk) and write PromQL queries Improve and operate CI/CD pipelines using GitHub Actions and Kubernetes Automate infrastructure tasks with Python, Bash, Go or SQL Work with Git-based workflows for infrastructure as code Troubleshoot Kubernetes workloads and containerised services More ❯
Posted:

SRE Lead - FX Trading (Investment Bank)

London Area, United Kingdom
Hybrid / WFH Options
Vertus Partners
and scalability of a real-time trading environment used by both internal and external clients. While production support remains an important aspect, this position is heavily weighted toward improving observability, driving proactive engineering practices, and developing tooling to eliminate repetitive manual tasks. You'll collaborate closely with developers, traders, and global colleagues to make meaningful changes to how the environment … is monitored, managed, and scaled. Key Responsibilities: Lead the development of automation and monitoring solutions to improve system resilience and eliminate recurring manual work Own and evolve observability practices using tools like Prometheus, Grafana, Splunk, Geneos, Corvil, etc. Engage directly with senior traders and engineers to troubleshoot complex trading system issues and improve end-to-end workflows Drive post-incident More ❯
Posted:

SRE Lead - FX Trading (Investment Bank)

City of London, London, United Kingdom
Hybrid / WFH Options
Vertus Partners
and scalability of a real-time trading environment used by both internal and external clients. While production support remains an important aspect, this position is heavily weighted toward improving observability, driving proactive engineering practices, and developing tooling to eliminate repetitive manual tasks. You'll collaborate closely with developers, traders, and global colleagues to make meaningful changes to how the environment … is monitored, managed, and scaled. Key Responsibilities: Lead the development of automation and monitoring solutions to improve system resilience and eliminate recurring manual work Own and evolve observability practices using tools like Prometheus, Grafana, Splunk, Geneos, Corvil, etc. Engage directly with senior traders and engineers to troubleshoot complex trading system issues and improve end-to-end workflows Drive post-incident More ❯
Posted:

Infrastructure Engineer

London, England, United Kingdom
Hybrid / WFH Options
Keyrock
Kubernetes clusters for containerized applications, ensuring high availability and security. Automation & CI/CD: Implement and manage CI/CD pipelines for efficient deployment, testing, and monitoring of applications. Observability & Monitoring: Develop comprehensive monitoring solutions using Prometheus, Grafana, ELK stack, or similar tools to improve system reliability. Security & Compliance: Apply best practices for cloud security, IAM policies, and compliance frameworks … self-managed clusters). Proficiency in scripting and automation using Python, Bash, or Go. Experience with Infrastructure as Code (Terraform, CloudFormation, or Ansible). Familiarity with monitoring, logging, and observability tools (Prometheus, Grafana, Datadog, ELK, etc.). Strong understanding of networking concepts (VPC, Load Balancers, DNS, Firewalls). Experience with DevOps methodologies, CI/CD pipelines, and GitOps practices. Experience More ❯
Posted:

Build and Release Engineer

London, England, United Kingdom
Hybrid / WFH Options
Scopely
collaboration and stakeholder management Development of automation tools and processes targeting reproducibility of procedures and development efficiency Monitoring, auditing and reporting of the Build systems and processes, by incorporating observability and alerts through all CICD lifecycle and infrastructure Participate code reviews, development processes related with CICD pipelines and automation tools to improve the effectiveness of engineering team members What We … processes, including CI/CD best practices, specifically for Unity 3D games Professional experience and high proficiency in programing languages and scripting for automation (i.e. python, bash) Experience with observability tools (ELK, Grafana, Prometheus, Datadog) to monitor and alert CICD stability Experience with version control systems, such as Git, and build management tools such as Jenkins, GitLab, Maven or Gradle More ❯
Posted:

Lead/Principle Python Engineer for Generative AI Backend Development

London, England, United Kingdom
Hybrid / WFH Options
Trimble
orchestrate LLM-based agents. Working with RAG frameworks: Use techniques such as chunking, hybrid search, query translation, similarity search, vector DBs, evaluation metrics, and ANN algorithms. Monitoring performance: Using observability services such as Datadog and Databricks for LLM Observability and analytics. Keep track of latest research: Given that this is a fast evolving field, it’s important to keep track More ❯
Posted:

DevOps Engineer

Portsmouth, England, United Kingdom
Hybrid / WFH Options
Trust In SODA
through the entire development life cycle. Infrastructure-as-code Bash Delivery methods and techniques, including agile scrum experience. Desirable Skills: RedHat OpenShift Hashicorp (such as Terraform, Packer, Vault) Ansible Observability (such as Prometheus, Grafana, Splunk) Containerised services (such as Postgres, Redis, Kafka, Keycloak, Elk) Experience of doing all the above at OS or S level YAML based pipelines. Immutable infrastructure More ❯
Posted:

Senior Site Reliability Engineer – FinTech / Global Payments – London HQ / Remote First

London, England, United Kingdom
Hybrid / WFH Options
Future Talent Group
resilient, observable, and maintainable features. Promote DevOps culture by leading knowledge-sharing sessions and supporting issue resolution. Skills Strong grounding in SRE principles and operational best practices. Proficient with observability tools (Prometheus, Grafana, OTEL, Cloudwatch) and telemetry pipelines. Solid programming skills in Python and/or Go; Java experience a plus. Strong Linux and networking fundamentals (TCP, DNS, TLS, HTTP More ❯
Posted:

DV Cleared DevOps Engineer

Bristol, Gloucestershire, United Kingdom
Hybrid / WFH Options
Curo Resourcing Ltd
domain adjacent technologies/services, such as: Docker, OpenShift, Kubernetes etc. Infrastructure as Code and CI/CD paradigms and systems such as: Ansible, Terraform, Jenkins, Bamboo, Concourse etc. Observability - SRE Big Data solutions (ecosystems) and technologies such as: Apache Spark and the Hadoop Ecosystem Excellent knowledge of YAML or similar languages The following Technical Skills & Experience would be desirable More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:
Observability
10th Percentile
£57,500
25th Percentile
£65,000
Median
£80,000
75th Percentile
£97,500
90th Percentile
£120,000