Cambridge, England, United Kingdom Hybrid/Remote Options
RegGenome
learn. Hands-on experience with Kubernetes and Terraform/Terragrunt/OpenTofu. Strong cloud infrastructure knowledge in either AWS or GCP. Nice to Have: Monitoring stack tools: Prometheus, Thanos, Loki, Alertmanager, Grafana. CI/CD experience with FluxCD (or ArgoCD). Database performance optimization and management experience. Qualities We Value: Solution-oriented mindset with a knack for solving tough More ❯
ll be doing: Building and maintaining a Kubernetes-hosted AI platform (AKS) Deploying and managing LLMOps tools such as LiteLLM, Langflow, and Langfuse Implementing observability with Prometheus, Grafana, and Loki Managing infrastructure through Terraform, ArgoCD, and GitHub Actions Supporting internal AI applications including RAG, document processing, and internal AI assistants What you’ll need: 2–4 years in Platform More ❯
IaC skills with Terraform and CI/CD pipelines 🐳 Kubernetes operations expertise on AWS (EKS) 🔒 Solid grounding in Linux, networking, and cloud security 📊 Familiarity with observability stacks (Prometheus, Grafana, Loki) If you’re ready to shape the infrastructure behind cutting-edge AI used by global enterprises, we’d love to hear from you. More ❯
understanding of Kubernetes operations on AWS , including scaling, deployment automation, and monitoring. Solid background in Linux administration , networking, and cloud security. Hands-on experience with observability stacks (Prometheus, Grafana, Loki). Knowledge of database reliability . Strong scripting skills. A collaborative approach with a passion for improving systems through automation and consistency. The role: Pay More ❯
firewalls, and system updates. Set up, configure, and maintain bare-metal or lightweight Kubernetes environments (e.g., kubeadm, K3s, MicroK8s). Monitor performance and reliability using observability tools (Prometheus, Grafana, Loki, ELK, etc.). Troubleshoot deployment, networking, and container runtime issues. Collaborate with development teams to ensure smooth delivery of applications and services. Maintain good documentation and follow DevOps best More ❯
and maintaining Azure Kubernetes (AKS) environments Managing Infrastructure as Code with Terraform and improving GitOps workflows (ArgoCD/GitHub Actions) Building observability and monitoring stacks using Prometheus, Grafana, and Loki Supporting AI workloads (LLMs, RAG, and document processing applications) running on Kubernetes Automating platform operations with Python, Go, and shell scripting Implementing security guardrails, PII compliance tooling, and best … experience in DevOps or Platform Engineering Strong background in Azure and Kubernetes Hands-on experience with Terraform, CI/CD, and container orchestration Familiarity with observability tools (Prometheus, Grafana, Loki) Scripting or programming skills in Python or Go Interest in AI infrastructure, LLMOps, or large language model deployment More ❯