Lead HPC & AI Infrastructure Engineer
Dorset, England, United Kingdom
Hybrid / WFH Options
Hybrid / WFH Options
Hays Specialist Recruitment Limited
internal engineering teams, OEMs, and external suppliers to build robust, scalable systems. Key responsibilities include: Designing end-to-end infrastructure solutions across compute, storage, and networking Producing detailed technical documentation: hardware specs, data centre layouts, cabling, power and cooling Installing and tuning Linux-based operating systems and configuring SLURM job schedulers Optimising high-speed networking technologies (Infiniband, RoCE) Automating … that scale, this role is for you. What you'll need to succeed Proven experience designing and scaling large HPC clusters (hundreds to thousands of nodes) Strong SLURM configuration skills - partitions, priorities, resource management Advanced Linux administration and performance tuning Expertise in high-performance networking (Infiniband, RoCE, RDMA) Experience with distributed file systems (Lustre, Ceph, WEKA, VAST) Proficiency in … automation and scripting (Ansible, Terraform, Bash, Python) A solid understanding of monitoring, resilience, and security compliance Excellent documentation skills and a passion for mentoring and knowledge sharing Desirable Experience Containerisation in HPC (Singularity, Docker, Apptainer) Familiarity with AI/ML workflows, GPU-aware MPI, NVLink Experience in cloud, academic, or research environments Vendor hardware validation and data centre More ❯
Employment Type: Full-Time
Salary: £130,000 per annum
Posted: