London, England, United Kingdom Hybrid / WFH Options
Tes
microservices design patterns and deployment strategies in a cloud-native environment. Security Best Practices: Strong understanding of security frameworks and compliance standards for cloud infrastructure and DevOps processes. Monitoring & Observability: Understanding of monitoring, logging, and alerting tools (e.g., Prometheus, Grafana, ELK) to ensure system performance and issue tracking. Skills CI/CD Tools: Hands-on experience with Jenkins, GitLab CI More ❯
London, England, United Kingdom Hybrid / WFH Options
Quaisr Limited
such as Kubernetes, Docker Swarm, or HashiCorp Nomad. Excellent problem-solving, communication, and collaboration skills. Nice to have: Experience managing distributed systems, microservices, and event-driven architectures. Knowledge of observability tools such as Prometheus, Grafana, ELK Stack, or Datadog. Experience with security best practices, monitoring, and incident response. Familiarity with DevSecOps and compliance frameworks (ISO 27001, SOC 2, GDPR). More ❯
such as Kubernetes, Docker Swarm, or HashiCorp Nomad. Excellent problem-solving, communication, and collaboration skills. Nice to have: Experience managing distributed systems, microservices, and event-driven architectures. Knowledge of observability tools such as Prometheus, Grafana, ELK Stack, or Datadog. Experience with security best practices, monitoring, and incident response. Familiarity with DevSecOps and compliance frameworks (ISO 27001, SOC 2, GDPR). More ❯
Liverpool, Lancashire, United Kingdom Hybrid / WFH Options
The Acorn Group
with GitOps tools (e.g., ArgoCD, Flux). CI/CD - Skilled in building and managing pipelines using Azure DevOps, GitHub Actions, etc. Monitoring - Experience with Prometheus, Grafana, and other observability tools. Application Stack - Familiarity with .NET, Node.js, React, and web server technologies like Nginx. Relevant certifications or the ability to demonstrate equivalent experience, such as: Terraform Associate About Acorn Insurance More ❯
Python (or other language), Bash/Shell, YAML including any Development frameworks Extensive experience and in-depth knowledge of the Linux operating system for effective troubleshooting activities Experience with Observability tools like Grafana, Prometheus, ELK, OCI Observability We highly value ownership and initiative with capabilities to drive projects independently Dealing with changes on a daily basis in a very dynamic More ❯
one or more public cloud providers such as Azure, AWS or GCP Proficiency using Infrastructure as Code (IaC) tools such as Terraform (preferred), Ansible, or CloudFormation. Experience with monitoring, observability and logging tools such as DataDog, Prometheus, Grafana, or similar. Proven track record of maintaining highly-available and performant production environments. Ability to identify and implement effective mitigation strategies and More ❯
London, England, United Kingdom Hybrid / WFH Options
Global Screening Services
impact! About The Role This is an exciting opportunity to join our growing Operations team managing Kubernetes clusters in Production and, through a DevOps culture, empower development teams with observability insights they can use to innovate faster. We are looking for a Site Reliability Engineer, or production experienced DevOps Engineer, who has working experience building observability for cloud native SaaS … products and driving operational excellence. You will be responsible for delivering our monitoring infrastructure, shaping observability, and responding to incidents as well as ensuring the platform is performant and reliable. You will be a key member of the team, liaising with product teams, embedding SRE principles and building the observability platform for the next stage of growth at GSS. You … new features are maintainable, have well defined SLIs, achievable SLOs, are properly monitored, and evaluated for failure scenarios Enabling development teams through DevOps culture and the effective use of observability tools. Promote best practice, present KT sessions, help troubleshoot and resolve business affecting issues Building on our existing monitoring tools to deliver a comprehensive, optimised observability platform for logging, metrics More ❯
Southampton, Hampshire, United Kingdom Hybrid / WFH Options
Spectrum IT Recruitment
level goals You'll Stand Out If You Have: Practical experience managing large-scale Kubernetes clusters; certifications in Kubernetes are a strong bonus Hands-on familiarity with the Grafana Observability Suite, including tools like Loki, Mimir, and Tempo Background in administering or developing with popular monitoring and automation tools such as Splunk, Datadog, PagerDuty, or Rundeck Experience using configuration management … principles and hands-on experience with tools such as Jenkins, GitLab CI/CD, or CircleCI Strong understanding of containerisation (e.g., Docker, Kubernetes) and microservices architecture Skilled in using observability and monitoring tools such as Prometheus, Grafana, ELK stack, or AWS CloudWatch Excellent analytical and troubleshooting abilities, especially within complex distributed systems Proven experience handling incident management and conducting blameless More ❯
Hampshire, England, United Kingdom Hybrid / WFH Options
Spectrum IT Recruitment
level goals You'll Stand Out If You Have: Practical experience managing large-scale Kubernetes clusters; certifications in Kubernetes are a strong bonus Hands-on familiarity with the Grafana Observability Suite, including tools like Loki, Mimir, and Tempo Background in administering or developing with popular monitoring and automation tools such as Splunk, Datadog, PagerDuty, or Rundeck Experience using configuration management … principles and hands-on experience with tools such as Jenkins, GitLab CI/CD, or CircleCI Strong understanding of containerisation (e.g., Docker, Kubernetes) and microservices architecture Skilled in using observability and monitoring tools such as Prometheus, Grafana, ELK stack, or AWS CloudWatch Excellent analytical and troubleshooting abilities, especially within complex distributed systems Proven experience handling incident management and conducting blameless More ❯
Eastbourne, England, United Kingdom Hybrid / WFH Options
AxisOps
and architecture through to production and operations. Our strength lies in software delivery, supported by deep expertise in platform engineering, built on an understanding of private cloud-native infrastructure, observability, and DevSecOps. Our culture We value sharp thinking, clear communication, and teams that look out for each other. At AxisOps, our core values are: Ingenuity – solving hard problems with elegant … runtimes is welcome but not required) Maintain and evolve microservice architecture built in Python and PHP, with deployment via GitLab CI/CD and runtime orchestration via Andromeda Deliver observability using Prometheus, Grafana, and the ELK stack, supporting metrics, logs, and alerting workflows Support and maintain internal ML infrastructure and pipelines , helping ensure that our AI and data workloads run … maintain standardised developer desktop environments , supporting our engineering team’s daily tooling and dev workflow Contribute to our IoT platform , including reliable edge infrastructure, secure messaging, and data flow observability Support and maintain our private datacentre , including rack-level hardware, networking, and server fleet resilience Continuously improve security posture , covering patching, firewall maintenance, secrets handling, and backup strategy Write markdown More ❯
Hedge End, England, United Kingdom Hybrid / WFH Options
Spectrum IT Recruitment
level goals You'll Stand Out If You Have: Practical experience managing large-scale Kubernetes clusters; certifications in Kubernetes are a strong bonus Hands-on familiarity with the Grafana Observability Suite, including tools like Loki, Mimir, and Tempo Background in administering or developing with popular monitoring and automation tools such as Splunk, Datadog, PagerDuty, or Rundeck Experience using configuration management … principles and hands-on experience with tools such as Jenkins, GitLab CI/CD, or CircleCI Strong understanding of containerisation (e.g., Docker, Kubernetes) and microservices architecture Skilled in using observability and monitoring tools such as Prometheus, Grafana, ELK stack, or AWS CloudWatch Excellent analytical and troubleshooting abilities, especially within complex distributed systems Proven experience handling incident management and conducting blameless More ❯
London, England, United Kingdom Hybrid / WFH Options
ZipRecruiter
level goals You'll Stand Out If You Have: Practical experience managing large-scale Kubernetes clusters; certifications in Kubernetes are a strong bonus Hands-on familiarity with the Grafana Observability Suite, including tools like Loki, Mimir, and Tempo Background in administering or developing with popular monitoring and automation tools such as Splunk, Datadog, PagerDuty, or Rundeck Experience using configuration management … principles and hands-on experience with tools such as Jenkins, GitLab CI/CD, or CircleCI Strong understanding of containerisation (e.g., Docker, Kubernetes) and microservices architecture Skilled in using observability and monitoring tools such as Prometheus, Grafana, ELK stack, or AWS CloudWatch Excellent analytical and troubleshooting abilities, especially within complex distributed systems Proven experience handling incident management and conducting blameless More ❯
London, England, United Kingdom Hybrid / WFH Options
Canada Life
infrastructure to the cloud and understanding the challenges involved Familiarity with cloud security best practices, identity and access management (IAM), and encryption techniques Microsoft Azure certifications are a plus Observability Designing, implementing and day-to-day use of logging and monitoring tools to capture data for alerting and issue identification and resolution using DataDog, App Insights or similar tools. Designing … applications and infrastructure for observability, security, and reliability. Networking & Security Monitor and enhance network performance, ensuring high levels of security and scalability across all cloud environments. Enforce security best practices in AKS, including network policies, RBAC (Role-Based Access Control), and integration with Azure Active Directory Core Services Software development experience, ideally in .NET stack. SQL skills to manage and More ❯
working in Agile teams using tools like Git , Jira , and Confluence Eligible for SC and NPPV3 clearance Desirable: Container orchestration with Kubernetes HashiCorp tools: Vault , Consul , Packer Monitoring and observability with Grafana , Prometheus , or similar Familiarity with cloud networking, VPCs, NAT Gateways, security groups, etc. Personal Attributes: Proactive and self-driven with a passion for technology Strong problem-solving mindset More ❯
City of London, London, United Kingdom Hybrid / WFH Options
Amber Labs
working in Agile teams using tools like Git , Jira , and Confluence Eligible for SC and NPPV3 clearance Desirable: Container orchestration with Kubernetes HashiCorp tools: Vault , Consul , Packer Monitoring and observability with Grafana , Prometheus , or similar Familiarity with cloud networking, VPCs, NAT Gateways, security groups, etc. Personal Attributes: Proactive and self-driven with a passion for technology Strong problem-solving mindset More ❯
London, England, United Kingdom Hybrid / WFH Options
BlackRock, Inc
access to the best tools available. We combine problem-solving skills with software and systems engineering to take a proactive approach in building fault-tolerant and secure systems, improving observability and zealously automating away toil. In this role you will: Use your site reliability expertise to design, operate and support Preqin's infrastructure, middleware and internal services. Improving their performance More ❯
operating system for effective troubleshooting activities Awareness of any cloud infrastructure principles (like AWS, GCP or OCI), understanding basic principles of secure software delivery is a plus Familiar with Observability tools like Grafana or Prometheus, understanding the importance of giving the correct visibility to our platforms and environments We highly value ownership and initiative with capabilities to drive projects independently More ❯
London, England, United Kingdom Hybrid / WFH Options
ZipRecruiter
premises infrastructure to the cloud and understanding the challenges involved Familiarity with cloud security best practices, and access management (IAM), and encryption techniques Microsoft Azure certifications are a plus Observability Designing, implementing and day-to-day use of logging and monitoring tools to capture data for alerting and issue identification and resolution using DataDog, App Insights or similar tools. Designing … applications and infrastructure for observability, security, and reliability. Networking & Security Monitor and enhance network performance, ensuring high levels of security and scalability across all cloud environments. Enforce security best practices in AKS, including network policies, RBAC (Role-Based Access Control), and integration with Azure Active Directory Core Services Azure core services such as Azure Storage, including Blob, Azure VMs, Azure More ❯
WAF, CloudFront, API GW, AWS Organizations, S3, ECS, EKS, Route 53, ELBs, OpenShift, Kubernetes, Docker Languages: TypeScript, Python Security & Scanning: AWS Guardrails, Checkov, Prisma Cloud, OSV Scanner, SonarQube, Renovate Observability & Logging: CloudWatch, OpenSearch Operating System Management: RedHat Satellite, AMI lifecycle management, Ubuntu Landscape Testing Tools: Pytest, Jest, Cypress APIs/Microservices: RESTful APIs, API Gateway, containerised services Version Control: GitLab … to as Senior DevOps Engineer. On a typical day you will Design, build, and maintain scalable, high-quality software and platform systems Implement and manage CI/CD pipelines, observability, security automation, automated testing, and engineering standards Lead feature development from concept to production with focus on quality and performance Troubleshoot issues, ensuring resilience, reliability, and minimal user disruption Contribute More ❯
to L3 networking Programming languages, such as C#, Python, Perl, Java, C++ CICD tools such as Azure DevOps, GitHub Actions, Gitlab, Jenkins, TeamCity Scripting languages such as PowerShell, bash Observability/Monitoring: Prometheus, Grafana, Splunk Containerisation tools such as Docker, K8S, OpenShift, EC, containers Hosting technologies such as IIS, nginx, Apache, App Service, LightSail Analytical and creative approach to problem More ❯
London, England, United Kingdom Hybrid / WFH Options
Northrop Grumman
Platforms Engineer to join our Data Platforms Infrastructure team. This role is pivotal in supporting and maintaining our Elastic Cloud Enterprise (ECE) environment, designing and deploying Elastic solutions for Observability and Search, and ensuring the highest standards of security, privacy, and compliance. The ideal candidate will have a strong background in Elasticsearch, a keen understanding of cloud architecture, and a … upgrades to ensure optimal performance and security. - Troubleshoot and resolve issues related to Elasticsearch clusters, indices, and ECE infrastructure. Design and Deployment: - Design scalable and efficient Elasticsearch solutions for Observability and Search use cases. - Implement best practices for data indexing, storage, and retrieval. - Work with stakeholders to understand requirements and translate them into technical solutions. Security, Privacy, and Compliance: - Implement … that enhance the Elasticsearch platform. Your Experience : - Extensive experience working with Elasticsearch, including ECE. - Strong understanding of Elasticsearch architecture, components, and APIs. - Experience with designing and deploying Elasticsearch for Observability and Search use cases. - Knowledge of security best practices and compliance requirements (e.g., GDPR, HIPAA). - Proficiency in scripting and automation (e.g., Python, Bash, Ansible). - Familiarity with cloud platforms More ❯
London, England, United Kingdom Hybrid / WFH Options
Magentus Group
to implement robust solutions that improve system performance, security, and developer productivity. You will be responsible for maintaining and evolving platform services, adopting best practices in infrastructure as code, observability, and DevOps methodologies. Key Responsibilities of the role: Platform Development & Automation Design, develop, and maintain cloud-native infrastructure and platform services. Automate provisioning, scaling, and monitoring of infrastructure and application … reliability. Implement Infrastructure as Code (IaC) using tools such as CDK, Terraform or CloudFormation. Reliability & Security Ensure platform reliability, scalability, and security through best practices and proactive monitoring. Implement observability solutions including logging, metrics, and distributed tracing. Support incident response and post-mortem analysis, driving continuous improvements. Collaborate with security teams to ensure compliance with security and regulatory requirements. Collaboration … tools (GitHub Actions, GitLab CI, or similar). Experience with scripting or programming languages (Python, Go, Bash, etc.). Understanding of networking, security principles, and best practices. Knowledge of observability tools such as Datadog, Prometheus, Grafana, etc. Desired Attributes Strong problem-solving skills with a proactive approach to improving systems and processes. Excellent communication and collaboration skills, able to work More ❯
Manchester, England, United Kingdom Hybrid / WFH Options
Magentus Group
to implement robust solutions that improve system performance, security, and developer productivity. You will be responsible for maintaining and evolving platform services, adopting best practices in infrastructure as code, observability, and DevOps methodologies. Key Responsibilities of the role: Platform Development & Automation Design, develop, and maintain cloud-native infrastructure and platform services. Automate provisioning, scaling, and monitoring of infrastructure and application … reliability. Implement Infrastructure as Code (IaC) using tools such as CDK, Terraform or CloudFormation. Reliability & Security Ensure platform reliability, scalability, and security through best practices and proactive monitoring. Implement observability solutions including logging, metrics, and distributed tracing. Support incident response and post-mortem analysis, driving continuous improvements. Collaborate with security teams to ensure compliance with security and regulatory requirements. Collaboration … tools (GitHub Actions, GitLab CI, or similar). Experience with scripting or programming languages (Python, Go, Bash, etc.). Understanding of networking, security principles, and best practices. Knowledge of observability tools such as Datadog, Prometheus, Grafana, etc. Desired Attributes Strong problem-solving skills with a proactive approach to improving systems and processes. Excellent communication and collaboration skills, able to work More ❯
London, England, United Kingdom Hybrid / WFH Options
Causaly Inc
Terraform and infrastructure-as-code best practices. • Manage and orchestrate containerized services with Kubernetes. • Maintain and improve CI/CD pipelines to support fast, secure delivery. • Implement and maintain observability tooling for monitoring, logging, and alerting. • Collaborate with engineering teams—including AI/ML, backend, and data—to support development and production workflows. • Help drive improvements in system performance, scalability … a plus. • Proficiency with Terraform and infrastructure-as-code principles. • Experience managing Kubernetes clusters and containerized applications. • Familiarity with CI/CD pipelines and modern DevOps tooling. • Experience with observability tools (e.g. Datadog, Prometheus, Grafana). • Strong scripting skills (e.g. Bash, Python). • Ability to contribute to architectural planning and system design. • Excellent problem-solving and analytical skills, especially in More ❯