Lead HPC & AI Infrastructure Engineer
Dorset, England, United Kingdom
Hybrid / WFH Options
Hybrid / WFH Options
Hays Specialist Recruitment Limited
end-to-end infrastructure solutions across compute, storage, and networking Producing detailed technical documentation: hardware specs, data centre layouts, cabling, power and cooling Installing and tuning Linux-based operating systems and configuring SLURM job schedulers Optimising high-speed networking technologies (Infiniband, RoCE) Automating deployments and maintenance using Ansible, Terraform, Bash, and Python Troubleshooting complex distributed systems and … engineers This is a rare opportunity to lead infrastructure projects that directly support cutting-edge AI research and development. If you thrive in technically challenging environments and enjoy building systems that scale, this role is for you. What you'll need to succeed Proven experience designing and scaling large HPC clusters (hundreds to thousands of nodes) Strong SLURM configuration … skills - partitions, priorities, resource management Advanced Linux administration and performance tuning Expertise in high-performance networking (Infiniband, RoCE, RDMA) Experience with distributed file systems (Lustre, Ceph, WEKA, VAST) Proficiency in automation and scripting (Ansible, Terraform, Bash, Python) A solid understanding of monitoring, resilience, and security compliance Excellent documentation skills and a passion for mentoring and knowledge sharing Desirable More ❯
Employment Type: Full-Time
Salary: £130,000 per annum
Posted: