Platform Engineer
- Hiring Organisation
- CATCHES
- Location
- Bradford, UK
- Employment Type
- Full-time
clusters. Responsibilities: Automate the provisioning and lifecycle of high-performance GPU clusters using Terraform and Ansible. Maintain the stability and performance of large-scale Linux environments supporting AI/ML training workloads. Collaborate with vendors and internal teams to troubleshoot hardware and networking bottlenecks (latency, throughput). Implement monitoring solutions … Grafana) to visualise GPU health and cluster efficiency. Assist in optimising the stack for containerised workloads (Kubernetes/Docker). Requirements: Strong background in Linux Systems Administration. Experience managing Bare Metal servers (on-premise or packet/equinix metal). Proficiency in Infrastructure as Code (IaC) tools. Nice to have ...