Southampton, Hampshire, South East, United Kingdom Hybrid / WFH Options
Spectrum It Recruitment Limited
Out If You Have: Practical experience managing large-scale Kubernetes clusters; certifications in Kubernetes are a strong bonus Hands-on familiarity with the Grafana Observability Suite, including tools like Loki, Mimir, and Tempo Background in administering or developing with popular monitoring and automation tools such as Splunk, Datadog, PagerDuty, or Rundeck Experience using configuration management platforms like Ansible, Puppet More ❯
on-premises and cloud-native) Strong Linux (preferably RHEL) systems administration skills Proven experience with HPC workloads or scientific computing clusters Hands-on experience with observability tools: Prometheus, Grafana, Loki Infrastructure as Code (IaC) using Terraform, Ansible CI/CD experience with GitOps tools (e.g., ArgoCD, Flux) Prior experience leading engineering teams in distributed environments More ❯
minimising resolution times and turnaround of code-fixes. Job Duties • Prioritise and provide advanced troubleshooting of incidents escalated via ServiceDesk across a range of technologies: Internal software, MySQL, Instana, Loki, RabbitMQ, Linux & Windows OS, Splunk, Prometheus, Grafana. • Develop clear and concise internal troubleshooting documentation to streamline incident resolution, ensuring each guide includes step-by-step instructions, common error scenarios …/Service or recent relevant qualification. • Previous experience and/or understanding of Windows & Linux OS. • Experience with one or a number of the following monitoring tools: Instana, Splunk, Loki, Prometheus, Grafana. • Experience with Database technologies such as Mysql, MongoDb or Redis and the relevant query language. • Previous experience and/or understanding of cloud-based infrastructure (ideally AWS More ❯