Kubernetes fundamentals. Building, testing, and documenting batch pipelines and lightweight streaming solutions. Preferred to have: Experience with Terraform, Ansible, or similar IaC tools. Exposure to observability stacks (Grafana, Prometheus, Loki, Fluentd, Azure Monitor). Familiarity with Azure cloud services. Relevant certifications (e.g., Azure AZ-900, Terraform Associate, CKA/CKAD) are a plus but not mandatory. Application Process To More ❯
disaster recovery initiatives. Working knowledge of cloud-native storage solutions such as Longhorn. Strong Linux administration skills, particularly with RHEL environments. Experience implementing comprehensive observability solutions using Prometheus, Grafana, Loki, and related tools. Ability to establish and enforce security policies through tools like Open Policy Agent. Knowledge of identity management solutions such as Keycloak. Experience managing artifact repositories including More ❯
Reigate, Surrey, England, United Kingdom Hybrid / WFH Options
esure Group
resilience. Qualifications What we’d love you to bring: Experience of AWS (particularly EC2, EKS, Lambda, S3, IAM, etc) Monitoring/alerting tools (for example we use Grafana, Prometheus, Loki, CloudWatch and Dynatrace) Knowledge of monitoring best practices for a variety of different platforms and technologies Docker and Kubernetes Git/Gitlab Jenkins/CI/CD/ArgoCD More ❯
and deployment pipelines Familiarity with regulated workflows: ISO27001, SOC2, GDPR aren't just abbreviations, and don't fill you with dread Observability skills: Well familiar with Open Telemetry, Prometheus, Loki and Grafana CI/CD pipeline skills: You know what it takes to build templates and guardrails to allow the most junior developers to confidently push code, safely knowing More ❯
Reigate, Surrey, England, United Kingdom Hybrid / WFH Options
esure Group
Qualifications What we’d love you to bring: Deep experience of AWS (particularly EC2, EKS, Lambda, S3, IAM, etc) Monitoring/alerting tools (for example we use Grafana, Prometheus, Loki, CloudWatch and Dynatrace) SME on monitoring best practices for a variety of different platforms and technologies Docker and Kubernetes Git/Gitlab Jenkins/CI/CD/ArgoCD More ❯
best practices across cloud and network environments. Troubleshoot deployment and performance issues across multiple environments. Set up and maintain observability tools for logging, monitoring, and alerting (e.g., Prometheus, Grafana, Loki). Contribute to internal tooling to streamline development, testing, and operations workflows. Stay current with DevOps trends and recommend improvements to tools and processes. Required Qualifications: Bachelor's degree … to multi-cloud or hybrid cloud architectures. Tech Stack: Cloud: AWS, OCI ZTN: Cloudflare Application: Kong (API Gateway), Java Spring Boot, Python, Go, TypeScript Monitoring: Prometheus Stack (Prometheus, Grafana, Loki) Compute: ECS, EC2, Lambda Frontend: S3, CloudFront Data: Glue, S3, PostgreSQL CI/CD: GitHub Actions IaC: Terraform, AWS SAM Why Join Us? At Intelmatix, you'll work on More ❯
provisioning, configuration management, and deployment. Responsibilities: Architect and deploy containerized applications using Red Hat OpenShift, ensuring optimal performance and scalability. Implement and optimize monitoring and logging tools (e.g., Prometheus, Loki, Grafana) for real-time tracking of application health. Transition OpenShift cluster operations from connected to partially disconnected environments, ensuring seamless deployment of critical security updates. Automate cluster provisioning and More ❯
Job-Specific Essential Duties and Responsibilities - Architect and deploy containerized applications using Red Hat OpenShift, ensuring optimal performance and scalability. - Implement and optimize monitoring and logging tools (e.g., Prometheus, Loki, Grafana) for real-time tracking of application health. - Transition OpenShift cluster operations from connected to partially disconnected environments, ensuring seamless deployment of critical security updates. - Automate cluster provisioning and More ❯
CD processes and pipelines to help accelerate software delivery. Experience working with on-prem and disconnected environments. Hands on experience with Kubernetes, Ansible, Vault, Jenkins, GitLab and Grafana/Loki stack. Experience working in Agile environments with Business Analysts, Scrum masters, and sprint cycles leveraging tools like Jira or Rally. Ability to effectively communicate with different level stakeholders (both More ❯
Edinburgh & Lothians, Scotland, United Kingdom Hybrid / WFH Options
Bright Purple Resourcing
on cluster configuration. Youll be working across: Multi-node RKE2 clusters set-up, networking, RBAC, disaster recovery and failover Installation and configuration of open source components Including Prometheus, Grafana, Loki, Alloy, PostgreSQL, Rook Ceph, ActiveMQ Artemis, and Keycloak Kubernetes native deployment tooling - Helm and Kustomize (creation and use required) plus exposure to FluxCD pipelines Service integration and lifecycle management More ❯
Cheltenham, Gloucestershire, South West, United Kingdom
Oscar Associates (UK) Limited
DevOps Engineer - eDV Cleared - Up to £100,000 Oscar Technology are working with a leading consultancy focused on delivering highly secure IT Infrastructure and Networks for government and defence organisations across the UK. Despite their successes to date, they have More ❯
EC2, RDS/Aurora, S3). Develop and maintain Infrastructure as Code using Terraform and configuration management with Ansible. Enhance monitoring, logging, and alerting using the Grafana stack (Prometheus, Loki, Tempo). Participate in incident management, root cause analysis, andpost-incident reviews. Implement automation to reduce manual operational tasks and improve recovery time. Contribute to the definition and tracking … relevant to production workloads (EKS, EC2, RDS/Aurora, S3, IAM). Infrastructure as Code with Terraform and configuration management with Ansible. Strong experience with observability tools (Grafana, Prometheus, Loki, Tempo). Understanding of SRE concepts (SLIs, SLOs, error budgets, capacity planning). Comfortable working in incident and problem management processes. Strong GitOps mindset for managing platform and configuration More ❯
EC2, RDS/Aurora, S3). Develop and maintain Infrastructure as Code using Terraform and configuration management with Ansible. Enhance monitoring, logging, and alerting using the Grafana stack (Prometheus, Loki, Tempo). Participate in incident management, root cause analysis, andpost-incident reviews. Implement automation to reduce manual operational tasks and improve recovery time. Contribute to the definition and tracking … relevant to production workloads (EKS, EC2, RDS/Aurora, S3, IAM). Infrastructure as Code with Terraform and configuration management with Ansible. Strong experience with observability tools (Grafana, Prometheus, Loki, Tempo). Understanding of SRE concepts (SLIs, SLOs, error budgets, capacity planning). Comfortable working in incident and problem management processes. Strong GitOps mindset for managing platform and configuration More ❯
EC2, RDS/Aurora, S3). Develop and maintain Infrastructure as Code using Terraform and configuration management with Ansible. Enhance monitoring, logging, and alerting using the Grafana stack (Prometheus, Loki, Tempo). Participate in incident management, root cause analysis, andpost-incident reviews. Implement automation to reduce manual operational tasks and improve recovery time. Contribute to the definition and tracking … relevant to production workloads (EKS, EC2, RDS/Aurora, S3, IAM). Infrastructure as Code with Terraform and configuration management with Ansible. Strong experience with observability tools (Grafana, Prometheus, Loki, Tempo). Understanding of SRE concepts (SLIs, SLOs, error budgets, capacity planning). Comfortable working in incident and problem management processes. Strong GitOps mindset for managing platform and configuration More ❯
Engineer, you'll shape a highly available, secure, and automated infrastructure, by spanning global data centers to containerized Kubernetes evironments. You'll work with modern tools like Prometheus, Grafana Loki, Ansible, and NGINX, directly impacting system performance, stability, and security. What makes this role stand out is the mix of deep technical work, freedom to drive automation, and responsibility … for a mission-critical platform. Activities Design, implement, and maintain monitoring systems using Prometheus and logging solutions with Grafana Loki to ensure system health, performance, and rapid issue detection. Deploy, configure, and optimize Linux applications like HAProxy, NGINX , and other critical services to ensure high availability and scalability. Drive Infrastructure as Code (IaC) automation using Ansible , enabling scalable, repeatable … experience, with a proven track record as a Linux DevOps Engineer in production environments. Strong hands-on experience with centralized monitoring solutions like Prometheus and logging platforms like Grafana Loki . In-depth knowledge of Linux-based applications such as HAProxy, NGINX , and experience with high-availability configurations and performance optimization. Practical experience with Infrastructure as Code (IaC) using More ❯