Southampton, Hampshire, South East, United Kingdom Hybrid / WFH Options
Spectrum It Recruitment Limited
Out If You Have: Practical experience managing large-scale Kubernetes clusters; certifications in Kubernetes are a strong bonus Hands-on familiarity with the Grafana Observability Suite, including tools like Loki, Mimir, and Tempo Background in administering or developing with popular monitoring and automation tools such as Splunk, Datadog, PagerDuty, or Rundeck Experience using configuration management platforms like Ansible, Puppet More ❯
a strong background and experience in the following: Observability and SRE Practices: In-depth understanding of observability and Site Reliability Engineering practices. Familiarity with tools in the LGTM stack (Loki, Grafana, Tempo, Mimir) or equivalent observability platforms. Containerisation: Strong experience building and managing containerised applications, effectively leveraging container orchestration platforms such as Kubernetes. Cloud Expertise: Demonstrable ability to architect More ❯
Experience orchestrating GPU/AI workloads , MLops, or large‐language‐model serving. Knowledge of edge/IoT deployments and over‐the‐air update strategies. Exposure to observability stacks (OpenTelemetry, Loki) and security tooling (Falco, Aqua, Wiz). What We Offer Base salary £100,000 – £170,000 plus meaningful equity. Gym membership Comprehensive health, dental & vision coverage (UK & global travel More ❯
London, England, United Kingdom Hybrid / WFH Options
Circadia Health
Experience orchestrating GPU/AI workloads , MLops, or large‐language‐model serving. Knowledge of edge/IoT deployments and over‐the‐air update strategies. Exposure to observability stacks (OpenTelemetry, Loki) and security tooling (Falco, Aqua, Wiz). What We Offer Base salary £100,000 – £170,000 plus meaningful equity. Gym membership Comprehensive health, dental & vision coverage (UK & global travel More ❯
We are seeking a skilled and proactive Kubernetes Platform Engineer to provide configuration and maintenance support for Kubernetes clusters for our high-availability environments for the EMEA region. Working closely with development, operations, infrastructure and security teams, you will help More ❯
London, England, United Kingdom Hybrid / WFH Options
Bright Purple
and build a new cloud-native IaC platform. Develop software using technologies such as Docker Compose, Terraform, Kubernetes (K8s), Python, and Go. Provision and orchestrate open-source services including Loki, Redis, Grafana, Authentik, Netbird, among others. Design and implement CI/CD pipelines to streamline deployment processes. Initially focus on AWS environments, with the goal of creating a solution More ❯
code tools (e.g., Terraform, Helm, Bash, Python). • Solid understanding of microservices, zero-trust security, mTLS, RBAC, and network policies. • Experience with CI/CD tools, logging (e.g., Fluentd, Loki), and monitoring (e.g., Prometheus, Grafana). About us Ascendion is a Global, leading provider of AI-first software engineering services, delivering transformative solutions across North America, APAC, and Europe. More ❯
code tools (e.g., Terraform, Helm, Bash, Python). • Solid understanding of microservices, zero-trust security, mTLS, RBAC, and network policies. • Experience with CI/CD tools, logging (e.g., Fluentd, Loki), and monitoring (e.g., Prometheus, Grafana). About us Ascendion is a Global, leading provider of AI-first software engineering services, delivering transformative solutions across North America, APAC, and Europe. More ❯
Demonstrated expertise in the process of containerization for applications and their subsequent orchestration within Kubernetes environments. Experience working on at least one monitoring/observability stack (Datadog, ELK, Splunk, Loki, Grafana). Strong knowledge of Unix or Linux Strong communication skills to collaborate with various stakeholders Able to work independently in a fast-paced environment Detail oriented, organized, demonstrating More ❯
Testing Azure Application Insights Azure Kubernetes Service • Platform tuning experience Beneficial skills • Bicep • CloudFlare • ARM Templates • Familiar with Octopus Deploy • Knowledge of C# .NET • Prometheus/Grafana dashboards • Seq, Loki or other application logging software • VM's Company benefits • Full private health insurance through our healthcare partner, Vitality Health • Group Life Insurance and Income Protection • BUPA Dental Insurance More ❯
CI/CD, building pipelines in GitHub Actions, GitLab CI or CircleCI with automated tests and security gates. An observability and SRE mindset, using tools such as Prometheus, Grafana, Loki or ELK and OpenTelemetry. A security-first but pragmatic approach, covering secrets management, image provenance and zero-trust networking. Proficiency in at least one systems language (Go, Python or More ❯
City of London, London, United Kingdom Hybrid / WFH Options
Stealth AI Startup
CI/CD, building pipelines in GitHub Actions, GitLab CI or CircleCI with automated tests and security gates. An observability and SRE mindset, using tools such as Prometheus, Grafana, Loki or ELK and OpenTelemetry. A security-first but pragmatic approach, covering secrets management, image provenance and zero-trust networking. Proficiency in at least one systems language (Go, Python or More ❯
logging, and metrics that seamlessly track requests across the entire lifecycle, from API Gateway through the runtime engine and sandboxed environments to external APIs, visualized in tools like Grafana, Loki, or Prometheus. Building intuitive self-service tools for internal developers, such as CLI tools, GitHub Actions, and Backstage plugins, enabling them to quickly provision new micro-services or AI More ❯
logging, and metrics that seamlessly track requests across the entire lifecycle, from API Gateway through the runtime engine and sandboxed environments to external APIs, visualized in tools like Grafana, Loki, or Prometheus. Building intuitive self-service tools for internal developers, such as CLI tools, GitHub Actions, and Backstage plugins, enabling them to quickly provision new micro-services or AI More ❯
London, UK We believe that we are better together, and at Tripadvisor we welcome you for who you are. Our workplace is for everyone, as is our people powered platform. At Tripadvisor, we want you to bring your unique perspective More ❯
We believe that we are better together, and at Tripadvisor we welcome you for who you are. Our workplace is for everyone, as is our people powered platform. At Tripadvisor, we want you to bring your unique perspective and experiences More ❯
on-premises and cloud-native) Strong Linux (preferably RHEL) systems administration skills Proven experience with HPC workloads or scientific computing clusters Hands-on experience with observability tools: Prometheus, Grafana, Loki Infrastructure as Code (IaC) using Terraform, Ansible CI/CD experience with GitOps tools (e.g., ArgoCD, Flux) Prior experience leading engineering teams in distributed environments More ❯
on-premises and cloud-native) Strong Linux (preferably RHEL) systems administration skills Proven experience with HPC workloads or scientific computing clusters Hands-on experience with observability tools : Prometheus, Grafana, Loki Infrastructure as Code (IaC) using Terraform , Ansible CI/CD experience with GitOps tools (e.g., ArgoCD, Flux) Prior experience leading engineering teams in distributed environments More ❯
on-premises and cloud-native) Strong Linux (preferably RHEL) systems administration skills Proven experience with HPC workloads or scientific computing clusters Hands-on experience with observability tools : Prometheus, Grafana, Loki Infrastructure as Code (IaC) using Terraform , Ansible CI/CD experience with GitOps tools (e.g., ArgoCD, Flux) Prior experience leading engineering teams in distributed environments More ❯
minimising resolution times and turnaround of code-fixes. Job Duties • Prioritise and provide advanced troubleshooting of incidents escalated via ServiceDesk across a range of technologies: Internal software, MySQL, Instana, Loki, RabbitMQ, Linux & Windows OS, Splunk, Prometheus, Grafana. • Develop clear and concise internal troubleshooting documentation to streamline incident resolution, ensuring each guide includes step-by-step instructions, common error scenarios …/Service or recent relevant qualification. • Previous experience and/or understanding of Windows & Linux OS. • Experience with one or a number of the following monitoring tools: Instana, Splunk, Loki, Prometheus, Grafana. • Experience with Database technologies such as Mysql, MongoDb or Redis and the relevant query language. • Previous experience and/or understanding of cloud-based infrastructure (ideally AWS More ❯