Lead technical discovery with prospects and customers, translating clinical and operational requirements into secure, scalable infrastructure designs. Build and maintain Kubernetes clusters, Terraform IaC, CI/CD pipelines , and observability tooling (Prometheus, Grafana). Optimise real‐time data pipelines using Apache Kafka, Snowflake, and Postgres —ensuring low‐latency, high‐reliability ingestion from IoT sensors and EHR integrations. Collaborate with our … DSPT, GDPR . Experience orchestrating GPU/AI workloads , MLops, or large‐language‐model serving. Knowledge of edge/IoT deployments and over‐the‐air update strategies. Exposure to observability stacks (OpenTelemetry, Loki) and security tooling (Falco, Aqua, Wiz). What We Offer Base salary £100,000 – £170,000 plus meaningful equity. Gym membership Comprehensive health, dental & vision coverage (UK More ❯
London, England, United Kingdom Hybrid / WFH Options
Tripadvisor
offices. What will you do As part of the SRE team you will be participating in design and implementing parts of our engineering platform that enables scaling, metrics and observability, ensures and improves reliability. Identify gaps in our engineering platform that improves availability, latency, performance, efficiency, change management, monitoring, emergency response Guide and mentor other people on the team and … partitioning, etc ) and architectural level (denormalisation, CQRS-ES, Federation, etc ) Experience building and working with and monitoring microservice architectures in large distributed cloud environments (ideally AWS). Experience with Observability tooling – having proficiency using tools like Elasticsearch, Kibana, APM, Sentry, Grafana, Prometheus, Overops, or similar The ability to guide and mentor other members within the team and improve the way More ❯
financial institutions. What You'll Do Maintain and improve our AWS-based infrastructure using Infrastructure-as-Code (Terraform) Support and scale Kubernetes clusters hosting critical microservices Design and enhance observability, alerting, and incident response processes Collaborate closely with engineers to ensure systems are reliable, secure, and performant Lead root cause analysis for production incidents and help prevent recurrence Build tooling More ❯
City of London, London, United Kingdom Hybrid / WFH Options
Explore Group
financial institutions. What You'll Do Maintain and improve our AWS-based infrastructure using Infrastructure-as-Code (Terraform) Support and scale Kubernetes clusters hosting critical microservices Design and enhance observability, alerting, and incident response processes Collaborate closely with engineers to ensure systems are reliable, secure, and performant Lead root cause analysis for production incidents and help prevent recurrence Build tooling More ❯
Leeds, West Yorkshire, Yorkshire, United Kingdom Hybrid / WFH Options
Fruition Group
pipelines Drive platform modernisation Manage a small team of engineers Align DevOps capabilities with the wider business Champion DevEx, reliability, and security Embed operational excellence and incident response Promote observability and performance optimisation Lead DevOps Engineer Requirements Proven line management experience Cloud-native expertise (any cloud provider is fine: GCP, AWS or Azure) Knowledge of GitLab CI/CD, Terraform More ❯
London, England, United Kingdom Hybrid / WFH Options
ZipRecruiter
resilient, observable, and maintainable features. Promote DevOps culture by leading knowledge-sharing sessions and supporting issue resolution. Skills Strong grounding in SRE principles and operational best practices. Proficient with observability tools (Prometheus, Grafana, OTEL, Cloudwatch) and telemetry pipelines. Solid programming skills in Python and/or Go; Java experience a plus. Hands-on AWS expertise (EKS, SQS, RDS, EC2, Lambda More ❯
Go Significant experience with AWS cloud infrastructure Deep understanding of IaC tools: Terraform, Packer, CloudFormation Proven leadership in multidisciplinary delivery teams Skills in Databases: MongoDB/Atlas; Messaging: Kafka; Observability: Prometheus, Grafana, Splunk Experience working in a DevOps environment with a focus on CI/CD pipelines Experience designing, implementing, securing, and supporting Unix/Linux platforms (preferably RHEL/ More ❯
London, England, United Kingdom Hybrid / WFH Options
Nordcloud group
languages such as C#, Python, Perl, Java, C++. Experience with CI/CD tools like Azure DevOps, GitHub Actions, GitLab, Jenkins, TeamCity. Scripting skills in PowerShell, Bash. Familiarity with observability and monitoring tools such as Prometheus, Grafana, Splunk. Experience with containerization tools like Docker, Kubernetes, OpenShift, EC2 containers. Analytical and creative problem-solving skills. We encourage you to apply, even More ❯
London, England, United Kingdom Hybrid / WFH Options
Nordcloud
Patterns for Development Programming languages, such as C#, Python, Perl, Java, C++ CICD tools such as Azure DevOps, GitHub Actions, Gitlab, Jenkins, TeamCity Scripting languages such as PowerShell, bash Observability/Monitoring: Prometheus, Grafana, Splunk Containerisation tools such as Docker, K8S, OpenShift, EC, containers Analytical and creative approach to problem solving We encourage you to apply , even if you don More ❯
London, England, United Kingdom Hybrid / WFH Options
Parity Technologies
Excellence : Contribute to Parity’s blockchain node operations, improving the reliability of the Polkadot network by managing test and benchmark networks in the cloud and on-prem. Enhance our observability initiatives by operating mainnet nodes for the Polkadot and Kusama Relaychain and System parachains, gathering crucial operational data for monitoring and incident response. Infrastructure Solutions : Conceptualize and build innovative infrastructure More ❯
London, England, United Kingdom Hybrid / WFH Options
Nordcloud, an IBM Company
Patterns for Development Programming languages, such as C#, Python, Perl, Java, C++ CICD tools such as Azure DevOps, GitHub Actions, Gitlab, Jenkins, TeamCity Scripting languages such as PowerShell, bash Observability/Monitoring: Prometheus, Grafana, Splunk Containerisation tools such as Docker, K8S, OpenShift, EC, containers Analytical and creative approach to problem solving We encourage you to apply , even if you don More ❯
London, England, United Kingdom Hybrid / WFH Options
Capgemini
using tools such as Terraform or CloudFormation. • Implement CI/CD pipelines, enabling continuous integration and continuous deployment for mission-critical applications. • Monitor system performance, availability, and security, implementing observability best practices. • Work in an Agile environment, engaging with stakeholders to understand requirements and deliver iterative improvements. You can bring your whole self to work. At Capgemini building an inclusive More ❯
IT workflows. Your responsibilities will also include developing CI/CD pipelines tailored for IT infrastructure, enhancing deployment efficiency, and integrating robust network security measures. You will establish comprehensive observability and proactive issue resolution strategies. We are seeking individuals passionate about network automation, security, and scalable IT solutions that enhance both campus and cloud network operations. You should possess extensive More ❯
or DevOps Expertise in microservices and API design Docker, and container runtime platforms such as Kubernetes, EKS, ECS etc Strong understand of operational concepts on AWS, particularly monitoring and observability, FinOps Utilising CI/CD tools, such as Bamboo, Jenkins, TeamCity, Bitbucket, in order to streamline delivery of new features and fixes Continual testing of code using Automated Testing Frameworks More ❯
or DevOps Expertise in microservices and API design Docker, and container runtime platforms such as Kubernetes, EKS, ECS etc Strong understand of operational concepts on AWS, particularly monitoring and observability, FinOps Utilising CI/CD tools, such as Bamboo, Jenkins, TeamCity, Bitbucket, in order to streamline delivery of new features and fixes Continual testing of code using Automated Testing Frameworks More ❯
Tuesdays, Thursdays WFH) Pay: negotiable, inside IR35 We're looking for an experienced DevOps Engineer to join our team on a contract basis, with a focus on AWS infrastructure, observability tooling, and CI/CD automation. This is a hands-on role supporting high-availability systems, rapid deployments, and production incident response. Key Responsibilities - Manage and monitor AWS infrastructure for … performance and security - Respond to production incidents, perform root cause analysis, and implement fixes - Maintain observability tools (Prometheus, Grafana, Splunk) and write PromQL queries - Improve and operate CI/CD pipelines using GitHub Actions and Kubernetes - Automate infrastructure tasks with Python, Bash, Go or SQL - Work with Git-based workflows for infrastructure as code - Troubleshoot Kubernetes workloads and containerised services More ❯
London, England, United Kingdom Hybrid / WFH Options
Stott and May
Tuesdays, Thursdays WFH) Pay: negotiable, inside IR35 We're looking for an experienced DevOps Engineer to join our team on a contract basis, with a focus on AWS infrastructure, observability tooling, and CI/CD automation. This is a hands-on role supporting high-availability systems, rapid deployments, and production incident response. Key Responsibilities Manage and monitor AWS infrastructure for … performance and security Respond to production incidents, perform root cause analysis, and implement fixes Maintain observability tools (Prometheus, Grafana, Splunk) and write PromQL queries Improve and operate CI/CD pipelines using GitHub Actions and Kubernetes Automate infrastructure tasks with Python, Bash, Go or SQL Work with Git-based workflows for infrastructure as code Troubleshoot Kubernetes workloads and containerised services More ❯
and scalability of a real-time trading environment used by both internal and external clients. While production support remains an important aspect, this position is heavily weighted toward improving observability, driving proactive engineering practices, and developing tooling to eliminate repetitive manual tasks. You'll collaborate closely with developers, traders, and global colleagues to make meaningful changes to how the environment … is monitored, managed, and scaled. Key Responsibilities: Lead the development of automation and monitoring solutions to improve system resilience and eliminate recurring manual work Own and evolve observability practices using tools like Prometheus, Grafana, Splunk, Geneos, Corvil, etc. Engage directly with senior traders and engineers to troubleshoot complex trading system issues and improve end-to-end workflows Drive post-incident More ❯
City of London, London, United Kingdom Hybrid / WFH Options
Vertus Partners
and scalability of a real-time trading environment used by both internal and external clients. While production support remains an important aspect, this position is heavily weighted toward improving observability, driving proactive engineering practices, and developing tooling to eliminate repetitive manual tasks. You'll collaborate closely with developers, traders, and global colleagues to make meaningful changes to how the environment … is monitored, managed, and scaled. Key Responsibilities: Lead the development of automation and monitoring solutions to improve system resilience and eliminate recurring manual work Own and evolve observability practices using tools like Prometheus, Grafana, Splunk, Geneos, Corvil, etc. Engage directly with senior traders and engineers to troubleshoot complex trading system issues and improve end-to-end workflows Drive post-incident More ❯
London, England, United Kingdom Hybrid / WFH Options
Keyrock
Kubernetes clusters for containerized applications, ensuring high availability and security. Automation & CI/CD: Implement and manage CI/CD pipelines for efficient deployment, testing, and monitoring of applications. Observability & Monitoring: Develop comprehensive monitoring solutions using Prometheus, Grafana, ELK stack, or similar tools to improve system reliability. Security & Compliance: Apply best practices for cloud security, IAM policies, and compliance frameworks … self-managed clusters). Proficiency in scripting and automation using Python, Bash, or Go. Experience with Infrastructure as Code (Terraform, CloudFormation, or Ansible). Familiarity with monitoring, logging, and observability tools (Prometheus, Grafana, Datadog, ELK, etc.). Strong understanding of networking concepts (VPC, Load Balancers, DNS, Firewalls). Experience with DevOps methodologies, CI/CD pipelines, and GitOps practices. Experience More ❯
London, England, United Kingdom Hybrid / WFH Options
Scopely
collaboration and stakeholder management Development of automation tools and processes targeting reproducibility of procedures and development efficiency Monitoring, auditing and reporting of the Build systems and processes, by incorporating observability and alerts through all CICD lifecycle and infrastructure Participate code reviews, development processes related with CICD pipelines and automation tools to improve the effectiveness of engineering team members What We … processes, including CI/CD best practices, specifically for Unity 3D games Professional experience and high proficiency in programing languages and scripting for automation (i.e. python, bash) Experience with observability tools (ELK, Grafana, Prometheus, Datadog) to monitor and alert CICD stability Experience with version control systems, such as Git, and build management tools such as Jenkins, GitLab, Maven or Gradle More ❯
London, England, United Kingdom Hybrid / WFH Options
Trimble
orchestrate LLM-based agents. Working with RAG frameworks: Use techniques such as chunking, hybrid search, query translation, similarity search, vector DBs, evaluation metrics, and ANN algorithms. Monitoring performance: Using observability services such as Datadog and Databricks for LLM Observability and analytics. Keep track of latest research: Given that this is a fast evolving field, it’s important to keep track More ❯
Portsmouth, England, United Kingdom Hybrid / WFH Options
Trust In SODA
through the entire development life cycle. Infrastructure-as-code Bash Delivery methods and techniques, including agile scrum experience. Desirable Skills: RedHat OpenShift Hashicorp (such as Terraform, Packer, Vault) Ansible Observability (such as Prometheus, Grafana, Splunk) Containerised services (such as Postgres, Redis, Kafka, Keycloak, Elk) Experience of doing all the above at OS or S level YAML based pipelines. Immutable infrastructure More ❯
London, England, United Kingdom Hybrid / WFH Options
Future Talent Group
resilient, observable, and maintainable features. Promote DevOps culture by leading knowledge-sharing sessions and supporting issue resolution. Skills Strong grounding in SRE principles and operational best practices. Proficient with observability tools (Prometheus, Grafana, OTEL, Cloudwatch) and telemetry pipelines. Solid programming skills in Python and/or Go; Java experience a plus. Strong Linux and networking fundamentals (TCP, DNS, TLS, HTTP More ❯
Bristol, Gloucestershire, United Kingdom Hybrid / WFH Options
Curo Resourcing Ltd
domain adjacent technologies/services, such as: Docker, OpenShift, Kubernetes etc. Infrastructure as Code and CI/CD paradigms and systems such as: Ansible, Terraform, Jenkins, Bamboo, Concourse etc. Observability - SRE Big Data solutions (ecosystems) and technologies such as: Apache Spark and the Hadoop Ecosystem Excellent knowledge of YAML or similar languages The following Technical Skills & Experience would be desirable More ❯