RabbitMQ. Understanding of in-memory data structure store/cache systems such as Redis. Hands-on knowledge of monitoring and analytical systems such as the Grafana/Prometheus/Loki stack or ELK. A strong understanding of security best practices. Good understanding of database technologies to mainly support the DBA team such as MySQL/MariaDB, ProxySQL, MySQL/ More ❯
Cambridge, England, United Kingdom Hybrid / WFH Options
RegGenome
learn. Hands-on experience with Kubernetes and Terraform/Terragrunt/OpenTofu. Strong cloud infrastructure knowledge in either AWS or GCP. Nice to Have: Monitoring stack tools: Prometheus, Thanos, Loki, Alertmanager, Grafana. CI/CD experience with FluxCD (or ArgoCD). Database performance optimization and management experience. Qualities We Value: Solution-oriented mindset with a knack for solving tough More ❯
LL BRING: Proven experience in observability, SRE, or platform engineering roles within complex, distributed environments. Strong hands-on expertise with telemetry tools such as OpenTelemetry, Prometheus, Grafana, Splunk, Elastic, Loki, Jaeger, or similar . Proficiency in at least one programming language (e.g., Python, Go, Java) and infrastructure-as-code tools (e.g., Terraform, Helm). Deep understanding of cloud-native More ❯
ll be doing: Building and maintaining a Kubernetes-hosted AI platform (AKS) Deploying and managing LLMOps tools such as LiteLLM, Langflow, and Langfuse Implementing observability with Prometheus, Grafana, and Loki Managing infrastructure through Terraform, ArgoCD, and GitHub Actions Supporting internal AI applications including RAG, document processing, and internal AI assistants What you’ll need: 2–4 years in Platform More ❯
leading research and enterprise teams. They’re looking for a Senior or Staff SRE with deep experience in observability at massive scale - someone who’s tuned Prometheus/Mimir, Loki, or Tempo clusters beyond 100M+ series or 10TB/day logs, and who thrives in highly technical, fast-moving environments. You’ll be working on: Designing and scaling observability More ❯
City of London, London, United Kingdom Hybrid / WFH Options
Motive Group
leading research and enterprise teams. They’re looking for a Senior or Staff SRE with deep experience in observability at massive scale - someone who’s tuned Prometheus/Mimir, Loki, or Tempo clusters beyond 100M+ series or 10TB/day logs, and who thrives in highly technical, fast-moving environments. You’ll be working on: Designing and scaling observability More ❯
london, south east england, united kingdom Hybrid / WFH Options
Motive Group
leading research and enterprise teams. They’re looking for a Senior or Staff SRE with deep experience in observability at massive scale - someone who’s tuned Prometheus/Mimir, Loki, or Tempo clusters beyond 100M+ series or 10TB/day logs, and who thrives in highly technical, fast-moving environments. You’ll be working on: Designing and scaling observability More ❯
london (city of london), south east england, united kingdom Hybrid / WFH Options
Motive Group
leading research and enterprise teams. They’re looking for a Senior or Staff SRE with deep experience in observability at massive scale - someone who’s tuned Prometheus/Mimir, Loki, or Tempo clusters beyond 100M+ series or 10TB/day logs, and who thrives in highly technical, fast-moving environments. You’ll be working on: Designing and scaling observability More ❯
slough, south east england, united kingdom Hybrid / WFH Options
Motive Group
leading research and enterprise teams. They’re looking for a Senior or Staff SRE with deep experience in observability at massive scale - someone who’s tuned Prometheus/Mimir, Loki, or Tempo clusters beyond 100M+ series or 10TB/day logs, and who thrives in highly technical, fast-moving environments. You’ll be working on: Designing and scaling observability More ❯
and maintaining Azure Kubernetes (AKS) environments Managing Infrastructure as Code with Terraform and improving GitOps workflows (ArgoCD/GitHub Actions) Building observability and monitoring stacks using Prometheus, Grafana, and Loki Supporting AI workloads (LLMs, RAG, and document processing applications) running on Kubernetes Automating platform operations with Python, Go, and shell scripting Implementing security guardrails, PII compliance tooling, and best … experience in DevOps or Platform Engineering Strong background in Azure and Kubernetes Hands-on experience with Terraform, CI/CD, and container orchestration Familiarity with observability tools (Prometheus, Grafana, Loki) Scripting or programming skills in Python or Go Interest in AI infrastructure, LLMOps, or large language model deployment More ❯