Reigate, Surrey, England, United Kingdom Hybrid / WFH Options
esure Group
resilience. Qualifications What we’d love you to bring: Experience of AWS (particularly EC2, EKS, Lambda, S3, IAM, etc) Monitoring/alerting tools (for example we use Grafana, Prometheus, Loki, CloudWatch and Dynatrace) Knowledge of monitoring best practices for a variety of different platforms and technologies Docker and Kubernetes Git/Gitlab Jenkins/CI/CD/ArgoCD More ❯
Reigate, Surrey, England, United Kingdom Hybrid / WFH Options
esure Group
Qualifications What we’d love you to bring: Deep experience of AWS (particularly EC2, EKS, Lambda, S3, IAM, etc) Monitoring/alerting tools (for example we use Grafana, Prometheus, Loki, CloudWatch and Dynatrace) SME on monitoring best practices for a variety of different platforms and technologies Docker and Kubernetes Git/Gitlab Jenkins/CI/CD/ArgoCD More ❯
Edinburgh & Lothians, Scotland, United Kingdom Hybrid / WFH Options
Bright Purple Resourcing
on cluster configuration. Youll be working across: Multi-node RKE2 clusters set-up, networking, RBAC, disaster recovery and failover Installation and configuration of open source components Including Prometheus, Grafana, Loki, Alloy, PostgreSQL, Rook Ceph, ActiveMQ Artemis, and Keycloak Kubernetes native deployment tooling - Helm and Kustomize (creation and use required) plus exposure to FluxCD pipelines Service integration and lifecycle management More ❯
Cheltenham, Gloucestershire, South West, United Kingdom
Oscar Associates (UK) Limited
DevOps Engineer - eDV Cleared - Up to £100,000 Oscar Technology are working with a leading consultancy focused on delivering highly secure IT Infrastructure and Networks for government and defence organisations across the UK. Despite their successes to date, they have More ❯
EC2, RDS/Aurora, S3). Develop and maintain Infrastructure as Code using Terraform and configuration management with Ansible. Enhance monitoring, logging, and alerting using the Grafana stack (Prometheus, Loki, Tempo). Participate in incident management, root cause analysis, andpost-incident reviews. Implement automation to reduce manual operational tasks and improve recovery time. Contribute to the definition and tracking … relevant to production workloads (EKS, EC2, RDS/Aurora, S3, IAM). Infrastructure as Code with Terraform and configuration management with Ansible. Strong experience with observability tools (Grafana, Prometheus, Loki, Tempo). Understanding of SRE concepts (SLIs, SLOs, error budgets, capacity planning). Comfortable working in incident and problem management processes. Strong GitOps mindset for managing platform and configuration More ❯
EC2, RDS/Aurora, S3). Develop and maintain Infrastructure as Code using Terraform and configuration management with Ansible. Enhance monitoring, logging, and alerting using the Grafana stack (Prometheus, Loki, Tempo). Participate in incident management, root cause analysis, andpost-incident reviews. Implement automation to reduce manual operational tasks and improve recovery time. Contribute to the definition and tracking … relevant to production workloads (EKS, EC2, RDS/Aurora, S3, IAM). Infrastructure as Code with Terraform and configuration management with Ansible. Strong experience with observability tools (Grafana, Prometheus, Loki, Tempo). Understanding of SRE concepts (SLIs, SLOs, error budgets, capacity planning). Comfortable working in incident and problem management processes. Strong GitOps mindset for managing platform and configuration More ❯