with GitOps tools (e.g., ArgoCD, Flux). CI/CD - Skilled in building and managing pipelines using Azure DevOps, GitHub Actions, etc. Monitoring - Experience with Prometheus, Grafana, and other observability tools. Application Stack - Familiarity with .NET, Node.js, React, and web server technologies like Nginx. Relevant certifications or the ability to demonstrate equivalent experience, such as: Terraform Associate About Acorn Insurance More ❯
Liverpool, Lancashire, United Kingdom Hybrid / WFH Options
The Acorn Group
with GitOps tools (e.g., ArgoCD, Flux). CI/CD - Skilled in building and managing pipelines using Azure DevOps, GitHub Actions, etc. Monitoring - Experience with Prometheus, Grafana, and other observability tools. Application Stack - Familiarity with .NET, Node.js, React, and web server technologies like Nginx. Relevant certifications or the ability to demonstrate equivalent experience, such as: Terraform Associate About Acorn Insurance More ❯
Southampton, Hampshire, South East, United Kingdom Hybrid / WFH Options
Spectrum It Recruitment Limited
level goals You'll Stand Out If You Have: Practical experience managing large-scale Kubernetes clusters; certifications in Kubernetes are a strong bonus Hands-on familiarity with the Grafana Observability Suite, including tools like Loki, Mimir, and Tempo Background in administering or developing with popular monitoring and automation tools such as Splunk, Datadog, PagerDuty, or Rundeck Experience using configuration management … principles and hands-on experience with tools such as Jenkins, GitLab CI/CD, or CircleCI Strong understanding of containerisation (e.g., Docker, Kubernetes) and microservices architecture Skilled in using observability and monitoring tools such as Prometheus, Grafana, ELK stack, or AWS CloudWatch Excellent analytical and troubleshooting abilities, especially within complex distributed systems Proven experience handling incident management and conducting blameless More ❯
along with the Onyx portfolio management team, to deliver industry-leading DevOps and Infrastructure products that provide Infrastructure-as-code abstractions and operating principles, leading cloud computing capability, automation, observability, operability, and developer experience. You will drive the product roadmap, guide product development initiatives, and ensure the successful launch and adoption of DevOps and Infrastructure products. Together, you will facilitate … the following characteristics, it would be a plus: Strong understanding of modern infrastructure and site reliability engineering practice, including Infrastructure-as-code tools (e.g. Terraform, Ansible ) and metrics and observability tools (e.g. Prometheus, Grafana ). Strong understanding of modern DevOps practice, including DevOps stacks (e.g. Jenkins, GitLab, CircleCI ). Cloud experience (e.g. AWS, Google Cloud, Azure, Kubernetes). Familiar with More ❯
Manchester, Lancashire, United Kingdom Hybrid / WFH Options
Arm Limited
infrastructure "Nice To Have" Skills and Experience: Experience in a GitOps solution such as ArgoCD, Flux or Fleet Implementation of the Security Development Lifecycle (SDL) in infrastructure Monitoring and observability using Prometheus and Grafana, ELK stack or equivalent Use of Kubernetes management systems such as Rancher Familiarity with open source project development cycles and contribution processes, particularly around CI/ More ❯
working in Agile teams using tools like Git, Jira, and Confluence Eligible for SC and NPPV3 clearance Desirable: Container orchestration with Kubernetes HashiCorp tools: Vault, Consul, Packer Monitoring and observability with Grafana, Prometheus, or similar Familiarity with cloud networking, VPCs, NAT Gateways, security groups, etc. Personal Attributes: Proactive and self-driven with a passion for technology Strong problem-solving mindset More ❯
South East London, England, United Kingdom Hybrid / WFH Options
Amber Labs
working in Agile teams using tools like Git , Jira , and Confluence Eligible for SC and NPPV3 clearance Desirable: Container orchestration with Kubernetes HashiCorp tools: Vault , Consul , Packer Monitoring and observability with Grafana , Prometheus , or similar Familiarity with cloud networking, VPCs, NAT Gateways, security groups, etc. Personal Attributes: Proactive and self-driven with a passion for technology Strong problem-solving mindset More ❯
and high availability CI/CD Pipeline Development: Develop and maintain robust CI/CD pipelines for continuous integration and deployment of ML models and related infrastructure Monitoring and Observability: Build and maintain comprehensive monitoring and alerting systems for our ML infrastructure and models, leveraging tools like DataDog to ensure system health and performance Collaboration and Mentorship: Collaborate effectively with More ❯
etc. Infrastructure as Code and CI/CD paradigms and systems such as: Ansible, Terraform, Jenkins, Bamboo, Concourse etc. Monitoring utilising products such as: Prometheus, Grafana, ELK, filebeat etc. Observability - SRE Big Data solutions (ecosystems) and technologies such as: Apache Spark and the Hadoop Ecosystem Edge technologies e.g. NGINX, HAProxy etc. Excellent knowledge of YAML or similar languages The following More ❯
Manual Tester (DV Security Clearance) Position Description CGI was recognised in the Sunday Times Best Places to Work List 2025 and has been named one of the 'World's Best Employers' by Forbes magazine. We offer a competitive salary, excellent More ❯
and scaling. Implement Containerisation and Orchestration - Containerise applications with Docker and deploy using Kubernetes, ECS, or similar. Manage Helm charts or Customise templates and enforce container security standards. Drive Observability and Operational Readiness - Implement monitoring, logging, and alerting with tools like Prometheus, Grafana, ELK, or Datadog. Create dashboards and promote the adoption of SLOs and error budgets. Embed Security and More ❯
meet business needs and objectives. Develop a baseline monitoring and tooling concept for cloud to address the need for compliance infrastructure reporting within agile deliveries as part of our Observability strategy. Develop concepts and tools for chargeback and showback (Financial Instrumentation) in a multicloud context. Implement and mature a cloud forecasting and capacity management solution for the enterprise. Collaborate with More ❯
multiple stakeholders including development teams to implement and maintain reliable and scalable systems while adhering to industry best practices and security standards. Responsibilities and Impact: Design, implement, and maintain observability solutions to track system health and performance. Analyze observability data to identify and troubleshoot potential issues proactively. Develop and implement alerts and notifications for critical events. Collaborate with development teams … in Computer Science, Information Technology, or a related field. 5+ years of experience as a Site Reliability Engineer or equivalent in a similar role. Proficient in application and infrastructure observability, Splunk OpenTelemetry preferred Experienced in production environments running in AWS Comfortable with Infrastructure as Code, Terraform is preferred Comfortable with CI/CD pipelines such as GitHub Actions, Azure DevOps More ❯
South East London, England, United Kingdom Hybrid / WFH Options
Explore Group
financial institutions. What You'll Do Maintain and improve our AWS-based infrastructure using Infrastructure-as-Code (Terraform) Support and scale Kubernetes clusters hosting critical microservices Design and enhance observability, alerting, and incident response processes Collaborate closely with engineers to ensure systems are reliable, secure, and performant Lead root cause analysis for production incidents and help prevent recurrence Build tooling More ❯
Hertford, Hertfordshire, South East, United Kingdom
Halian Technology Limited
adoptionmanaging CI/CD pipelines, Docker containers, and security-first deployment pipelines. Implement high-availability systems and disaster recovery for business continuity across time zones and territories. Maintain system observability and monitoring to proactively identify issues and optimize system health. Ensure compliance with security standards and data privacy regulations across regions. Manage third-party vendors, licenses, and infrastructure budgets. Required More ❯
of new software and tools into the platform. Support scalable, resilient cloud environments with modern DevOps practices. Promote GitOps deployment strategies and mentor peers in DevOps best practice. Enhance observability using tools like Prometheus and Grafana. This role is ideal for someone looking to take the next step in a DevOps career while working with a modern tech stack in More ❯
remote teams and distributed delivery models Additional skills that are a plus: Programming languages such as Scala, Rust, Go, Angular, React, Kotlin Database management with PostgreSQL Experience with ElasticSearch, observability tools like Grafana and Prometheus What this role can offer Opportunity to deepen understanding of AI and Data Science applications Mentorship and support from colleagues to apply your talents Career More ❯
or DevOps Expertise in microservices and API design Docker, and container runtime platforms such as Kubernetes, EKS, ECS etc Strong understand of operational concepts on AWS, particularly monitoring and observability, FinOps Utilising CI/CD tools, such as Bamboo, Jenkins, TeamCity, Bitbucket, in order to streamline delivery of new features and fixes Continual testing of code using Automated Testing Frameworks More ❯
Portsmouth, England, United Kingdom Hybrid / WFH Options
Trust In SODA
through the entire development life cycle. Infrastructure-as-code Bash Delivery methods and techniques, including agile scrum experience. Desirable Skills: RedHat OpenShift Hashicorp (such as Terraform, Packer, Vault) Ansible Observability (such as Prometheus, Grafana, Splunk) Containerised services (such as Postgres, Redis, Kafka, Keycloak, Elk) Experience of doing all the above at OS or S level YAML based pipelines. Immutable infrastructure More ❯
Bristol, Gloucestershire, United Kingdom Hybrid / WFH Options
Curo Resourcing Ltd
domain adjacent technologies/services, such as: Docker, OpenShift, Kubernetes etc. Infrastructure as Code and CI/CD paradigms and systems such as: Ansible, Terraform, Jenkins, Bamboo, Concourse etc. Observability - SRE Big Data solutions (ecosystems) and technologies such as: Apache Spark and the Hadoop Ecosystem Excellent knowledge of YAML or similar languages The following Technical Skills & Experience would be desirable More ❯
Dundee, Angus, United Kingdom Hybrid / WFH Options
Ivanti
user experience. This department plays a pivotal role in shaping the company's growth trajectory through continuous innovation and customer-centric solutions. What You Will Be Doing Assist in Observability Implementation: Support the development and maintenance of monitoring, logging, and tracing solutions. Monitor & Manage Observability Tools: Help deploy and manage observability platforms such as Azure Application Insights (AppInsights), New Relic … Resolution) and reduce false positives. Ensure Cloud & Infrastructure Visibility: Contribute to scalable monitoring solutions for AWS and Azure environments. Collaborate with DevOps & SRE Teams: Work with teams to integrate observability best practices into CI/CD pipelines. Documentation & Knowledge Sharing: Contribute to runbooks, dashboards, and best practice guides to support observability initiatives. To Be Successful in The Role, You Will … Have Required Qualifications: 3-5 years of experience in observability, monitoring, or DevOps-related roles. Basic experience with monitoring tools such as Azure AppInsights, New Relic, Prometheus, and Grafana. Understanding of OpenTelemetry, New Relic, AppInsights APM for telemetry data collection. Familiarity with AWS and Azure cloud environments. Exposure to Kubernetes and container monitoring. Basic scripting knowledge (Python, Go, Bash, or More ❯
South East London, England, United Kingdom Hybrid / WFH Options
Cognitive Group | Part of the Focus Cloud Group
and service incidents with root cause analysis and preventive measures. Handle change requests, track recurring issues, and work on long-term fixes to improve system stability. Implement and maintain observability solutions using Prometheus, Grafana, and Splunk. Write PromQL queries for custom monitoring dashboards, alerting, and diagnostics. Manage and optimize CI/CD pipelines for automated testing, deployment, and rollback strategies. … AWS services at the DevOps Engineer level Incident, change & problem management experience. This role is heavily operation-oriented, including on-call requirements Strong background in setup & operation of enterprise observability tooling, specifically Prometheus, Grafana and Splunk, including usage of PromQL Proficient in one or more languages of Python, Go, Bash, SQL Familiar with GitHub/GitOps/container orchestration/ More ❯
customers consume our products. Additionally, you'll: People manage a team, developing skillsets and capabilities to support strategic outcomes Develop technical skills through continuous learning and development Support strategic observability, maintaining a strong awareness of service, creating operational views of data, and supporting the development of targets for the team to deliver against Provide operational support for product and service … would experience of Python, Terraform, Ansible, and PowerShell. Ideally, you'll also have experience in data centre networking, including software-defined networking. Furthermore, you'll need: Experience of using observability tools and techniques with the ability to use data, information, and user sentiment to continuously improve solutions In depth public cloud vendor knowledge covering GCP, AWS, and Azure, Extensive experience More ❯
that affect millions of users Design and implement Infrastructure as Code solutions that set industry standards Build resilient CI/CD pipelines using Bitbucket and Spacelift orchestration Develop sophisticated observability strategies with Grafana , CloudWatch , and advanced monitoring tools Leadership & Growth Opportunities Mentor emerging DevOps talent and shape team culture Influence architectural decisions across cross-functional teams Drive strategic initiatives that … TypeScript capabilities (this is code-heavy DevOps) Cloud Platforms : Recent AWS experience with enterprise-scale deployments CI/CD Mastery : Advanced experience with Jenkins, Bitbucket Pipelines, and orchestration tools Observability : Hands-on expertise with Grafana, Splunk, CloudWatch for proactive monitoring Leadership & Delivery: Proven track record architecting scalable, secure infrastructure solutions Experience implementing advanced security measures across DevOps workflows Large-scale More ❯