Grays, England, United Kingdom Hybrid / WFH Options
TES
microservices design patterns and deployment strategies in a cloud-native environment. Security Best Practices: Strong understanding of security frameworks and compliance standards for cloud infrastructure and DevOps processes. Monitoring & Observability: Understanding of monitoring, logging, and alerting tools (e.g., Prometheus, Grafana, ELK) to ensure system performance and issue tracking. Skills CI/CD Tools: Hands-on experience with Jenkins, GitLab CI More ❯
with GitOps tools (e.g., ArgoCD, Flux). CI/CD - Skilled in building and managing pipelines using Azure DevOps, GitHub Actions, etc. Monitoring - Experience with Prometheus, Grafana, and other observability tools. Application Stack - Familiarity with .NET, Node.js, React, and web server technologies like Nginx. Relevant certifications or the ability to demonstrate equivalent experience, such as: Terraform Associate About Acorn Insurance More ❯
Liverpool, Lancashire, United Kingdom Hybrid / WFH Options
The Acorn Group
with GitOps tools (e.g., ArgoCD, Flux). CI/CD - Skilled in building and managing pipelines using Azure DevOps, GitHub Actions, etc. Monitoring - Experience with Prometheus, Grafana, and other observability tools. Application Stack - Familiarity with .NET, Node.js, React, and web server technologies like Nginx. Relevant certifications or the ability to demonstrate equivalent experience, such as: Terraform Associate About Acorn Insurance More ❯
is a plus) A strong interest in automation, infrastructure best practices, and continuous learning Good communication skills and a collaborative mindset Experience with Terraform, Ansible, or Helm Familiarity with observability tools such as ELK Stack, CloudWatch, or New Relic Understanding of security considerations in cloud and CI/CD environments #J-18808-Ljbffr More ❯
Bristol, England, United Kingdom Hybrid / WFH Options
Canada Life Assurance Europe plc
infrastructure to the cloud and understanding the challenges involved Familiarity with cloud security best practices, identity and access management (IAM), and encryption techniques Microsoft Azure certifications are a plus Observability Designing, implementing and day-to-day use of logging and monitoring tools to capture data for alerting and issue identification and resolution using DataDog, App Insights or similar tools. Designing … applications and infrastructure for observability, security, and reliability. Networking & Security Monitor and enhance network performance, ensuring high levels of security and scalability across all cloud environments. Enforce security best practices in AKS, including network policies, RBAC (Role-Based Access Control), and integration with Azure Active Directory Core Services Azure core services such as Azure Storage, including Blob, Azure VMs, Azure More ❯
Southampton, Hampshire, South East, United Kingdom Hybrid / WFH Options
Spectrum It Recruitment Limited
level goals You'll Stand Out If You Have: Practical experience managing large-scale Kubernetes clusters; certifications in Kubernetes are a strong bonus Hands-on familiarity with the Grafana Observability Suite, including tools like Loki, Mimir, and Tempo Background in administering or developing with popular monitoring and automation tools such as Splunk, Datadog, PagerDuty, or Rundeck Experience using configuration management … principles and hands-on experience with tools such as Jenkins, GitLab CI/CD, or CircleCI Strong understanding of containerisation (e.g., Docker, Kubernetes) and microservices architecture Skilled in using observability and monitoring tools such as Prometheus, Grafana, ELK stack, or AWS CloudWatch Excellent analytical and troubleshooting abilities, especially within complex distributed systems Proven experience handling incident management and conducting blameless More ❯
Hampshire, England, United Kingdom Hybrid / WFH Options
Spectrum IT Recruitment
level goals You'll Stand Out If You Have: Practical experience managing large-scale Kubernetes clusters; certifications in Kubernetes are a strong bonus Hands-on familiarity with the Grafana Observability Suite, including tools like Loki, Mimir, and Tempo Background in administering or developing with popular monitoring and automation tools such as Splunk, Datadog, PagerDuty, or Rundeck Experience using configuration management … principles and hands-on experience with tools such as Jenkins, GitLab CI/CD, or CircleCI Strong understanding of containerisation (e.g., Docker, Kubernetes) and microservices architecture Skilled in using observability and monitoring tools such as Prometheus, Grafana, ELK stack, or AWS CloudWatch Excellent analytical and troubleshooting abilities, especially within complex distributed systems Proven experience handling incident management and conducting blameless More ❯
along with the Onyx portfolio management team, to deliver industry-leading DevOps and Infrastructure products that provide Infrastructure-as-code abstractions and operating principles, leading cloud computing capability, automation, observability, operability, and developer experience. You will drive the product roadmap, guide product development initiatives, and ensure the successful launch and adoption of DevOps and Infrastructure products. Together, you will facilitate … the following characteristics, it would be a plus: Strong understanding of modern infrastructure and site reliability engineering practice, including Infrastructure-as-code tools (e.g. Terraform, Ansible ) and metrics and observability tools (e.g. Prometheus, Grafana ). Strong understanding of modern DevOps practice, including DevOps stacks (e.g. Jenkins, GitLab, CircleCI ). Cloud experience (e.g. AWS, Google Cloud, Azure, Kubernetes). Familiar with More ❯
Manchester, Lancashire, United Kingdom Hybrid / WFH Options
Arm Limited
infrastructure "Nice To Have" Skills and Experience: Experience in a GitOps solution such as ArgoCD, Flux or Fleet Implementation of the Security Development Lifecycle (SDL) in infrastructure Monitoring and observability using Prometheus and Grafana, ELK stack or equivalent Use of Kubernetes management systems such as Rancher Familiarity with open source project development cycles and contribution processes, particularly around CI/ More ❯
South East London, England, United Kingdom Hybrid / WFH Options
Amber Labs
working in Agile teams using tools like Git , Jira , and Confluence Eligible for SC and NPPV3 clearance Desirable: Container orchestration with Kubernetes HashiCorp tools: Vault , Consul , Packer Monitoring and observability with Grafana , Prometheus , or similar Familiarity with cloud networking, VPCs, NAT Gateways, security groups, etc. Personal Attributes: Proactive and self-driven with a passion for technology Strong problem-solving mindset More ❯
and high availability CI/CD Pipeline Development: Develop and maintain robust CI/CD pipelines for continuous integration and deployment of ML models and related infrastructure Monitoring and Observability: Build and maintain comprehensive monitoring and alerting systems for our ML infrastructure and models, leveraging tools like DataDog to ensure system health and performance Collaboration and Mentorship: Collaborate effectively with More ❯
etc. Infrastructure as Code and CI/CD paradigms and systems such as: Ansible, Terraform, Jenkins, Bamboo, Concourse etc. Monitoring utilising products such as: Prometheus, Grafana, ELK, filebeat etc. Observability - SRE Big Data solutions (ecosystems) and technologies such as: Apache Spark and the Hadoop Ecosystem Edge technologies e.g. NGINX, HAProxy etc. Excellent knowledge of YAML or similar languages The following More ❯
Manual Tester (DV Security Clearance) Position Description CGI was recognised in the Sunday Times Best Places to Work List 2025 and has been named one of the 'World's Best Employers' by Forbes magazine. We offer a competitive salary, excellent More ❯
Leeds, England, United Kingdom Hybrid / WFH Options
ZipRecruiter
platform modernisation Mentor and lead a small team of engineers Align DevOps capabilities with the wider business Champion DevEx, reliability, and security Embed operational excellence and incident response Promote observability and performance optimisation Lead DevOps Engineer Requirements Proven technical and some leader/mentoring experience Cloud- expertise (any cloud provider is fine: GCP, AWS or Azure) Knowledge of GitLab CI More ❯
Bradford, England, United Kingdom Hybrid / WFH Options
JR United Kingdom
platform modernisation Mentor and lead a small team of engineers Align DevOps capabilities with the wider business Champion DevEx, reliability, and security Embed operational excellence and incident response Promote observability and performance optimisation Proven technical and some leader/mentoring experience Cloud-native expertise (any cloud provider is fine: GCP, AWS or Azure) Knowledge of GitLab CI/CD, Terraform More ❯
DevOps & Automation Create and manage automation pipelines for deployments. Implement Infrastructure as Code (IaC) using tools such as Terraform or Ansible. Monitor and enhance system performance using logging and observability tools. Develop automation solutions for provisioning, scaling, and maintenance. Support containerization efforts with Docker/Kubernetes where applicable. Networking & System Administration Configure and maintain network infrastructure, including firewalls, VLANs, and More ❯
Southampton, England, United Kingdom Hybrid / WFH Options
JR United Kingdom
/CD pipelines, ensuring rapid and reliable code delivery. Support microservices architecture, focusing on latency-sensitive and high-availability services. Monitor system performance, conduct root cause analysis, and implement observability best practices (metrics, logging, tracing). Harden infrastructure and deployments with infrastructure as code (Terraform/CDK/CloudFormation). Lead incident response, system reliability efforts, and infrastructure scalability initiatives. More ❯
delivery practices through tooling and coaching. Provide architectural input on how platform choices impact software delivery and operability. Join wider Application Development squads to accelerate delivery of key projects. Observability, Site Reliability Operate production workloads with an SRE mindset: measure reliability, define SLOs, and reduce toil through automation. Lead initiatives to reduce operational toil and enhance system resilience through automation. More ❯
and scaling. Implement Containerisation and Orchestration - Containerise applications with Docker and deploy using Kubernetes, ECS, or similar. Manage Helm charts or Customise templates and enforce container security standards. Drive Observability and Operational Readiness - Implement monitoring, logging, and alerting with tools like Prometheus, Grafana, ELK, or Datadog. Create dashboards and promote the adoption of SLOs and error budgets. Embed Security and More ❯
multiple stakeholders including development teams to implement and maintain reliable and scalable systems while adhering to industry best practices and security standards. Responsibilities and Impact: Design, implement, and maintain observability solutions to track system health and performance. Analyze observability data to identify and troubleshoot potential issues proactively. Develop and implement alerts and notifications for critical events. Collaborate with development teams … in Computer Science, Information Technology, or a related field. 5+ years of experience as a Site Reliability Engineer or equivalent in a similar role. Proficient in application and infrastructure observability, Splunk OpenTelemetry preferred Experienced in production environments running in AWS Comfortable with Infrastructure as Code, Terraform is preferred Comfortable with CI/CD pipelines such as GitHub Actions, Azure DevOps More ❯
Manchester, England, United Kingdom Hybrid / WFH Options
Magentus Group
to implement robust solutions that improve system performance, security, and developer productivity. You will be responsible for maintaining and evolving platform services, adopting best practices in infrastructure as code, observability, and DevOps methodologies. Key Responsibilities of the role: Platform Development & Automation Design, develop, and maintain cloud-native infrastructure and platform services. Automate provisioning, scaling, and monitoring of infrastructure and application … reliability. Implement Infrastructure as Code (IaC) using tools such as CDK, Terraform or CloudFormation. Reliability & Security Ensure platform reliability, scalability, and security through best practices and proactive monitoring. Implement observability solutions including logging, metrics, and distributed tracing. Support incident response and post-mortem analysis, driving continuous improvements. Collaborate with security teams to ensure compliance with security and regulatory requirements. Collaboration … tools (GitHub Actions, GitLab CI, or similar). Experience with scripting or programming languages (Python, Go, Bash, etc.). Understanding of networking, security principles, and best practices. Knowledge of observability tools such as Datadog, Prometheus, Grafana, etc. Desired Attributes Strong problem-solving skills with a proactive approach to improving systems and processes. Excellent communication and collaboration skills, able to work More ❯
South East London, England, United Kingdom Hybrid / WFH Options
Explore Group
financial institutions. What You'll Do Maintain and improve our AWS-based infrastructure using Infrastructure-as-Code (Terraform) Support and scale Kubernetes clusters hosting critical microservices Design and enhance observability, alerting, and incident response processes Collaborate closely with engineers to ensure systems are reliable, secure, and performant Lead root cause analysis for production incidents and help prevent recurrence Build tooling More ❯
Hertford, Hertfordshire, South East, United Kingdom
Halian Technology Limited
adoptionmanaging CI/CD pipelines, Docker containers, and security-first deployment pipelines. Implement high-availability systems and disaster recovery for business continuity across time zones and territories. Maintain system observability and monitoring to proactively identify issues and optimize system health. Ensure compliance with security standards and data privacy regulations across regions. Manage third-party vendors, licenses, and infrastructure budgets. Required More ❯
of new software and tools into the platform. Support scalable, resilient cloud environments with modern DevOps practices. Promote GitOps deployment strategies and mentor peers in DevOps best practice. Enhance observability using tools like Prometheus and Grafana. This role is ideal for someone looking to take the next step in a DevOps career while working with a modern tech stack in More ❯
High Wycombe, England, United Kingdom Hybrid / WFH Options
JR United Kingdom
delivery. Work closely with the engineering team to support microservices architecture, with focus on latency-sensitive and high-availability services. Monitor system performance, conduct root cause analysis, and implement observability best practices (metrics, logging, tracing). Harden infrastructure and deployments with infrastructure as code (Terraform/CDK/CloudFormation). Lead incident response, system reliability efforts, and infrastructure scalability initiatives. More ❯