Lead HPC & AI Infrastructure Engineer
Dorset, England, United Kingdom
Hybrid / WFH Options
Hybrid / WFH Options
Hays Specialist Recruitment Limited
                                
                                    detailed technical documentation: hardware specs, data centre layouts, cabling, power and cooling Installing and tuning Linux-based operating systems and configuring SLURM job schedulers Optimising high-speed networking technologies (Infiniband, RoCE) Automating deployments and maintenance using Ansible, Terraform, Bash, and Python Troubleshooting complex distributed systems and mentoring junior engineers This is a rare opportunity to lead infrastructure projects that directly … and scaling large HPC clusters (hundreds to thousands of nodes) Strong SLURM configuration skills - partitions, priorities, resource management Advanced Linux administration and performance tuning Expertise in high-performance networking (Infiniband, RoCE, RDMA) Experience with distributed file systems (Lustre, Ceph, WEKA, VAST) Proficiency in automation and scripting (Ansible, Terraform, Bash, Python) A solid understanding of monitoring, resilience, and security compliance Excellent More ❯
                                
                                Employment Type: Full-Time
                                    Salary: £130,000 per annum
                                    Posted: