Hampshire, England, United Kingdom Hybrid / WFH Options
Hays Specialist Recruitment Limited
It's a great opportunity for someone who thrives in project-led infrastructure work and wants to help shape cutting-edge HPC solutions. What you'll need to succeed Slurm: Proven experience managing and tuning HPC job schedulers. Infiniband and RoCE: Deep knowledge of high-speed networking technologies. Ansible: Proficiency in using Ansible for automation and configuration management. Networking More ❯
Edinburgh, Midlothian, United Kingdom Hybrid / WFH Options
Lenovo
transformer architectures, and related concepts. Experience with data processing tools and techniques (e.g., Pandas, NumPy). Experience working with Linux systems and/or HPC cluster job scheduling (e.g., Slurm, PBS). Excellent communication, collaboration, and problem solving skills. Bonus Points Ph.D. in Computer Science, Machine Learning, or a related field. Experience with distributed training frameworks (e.g., DeepSpeed, Megatron More ❯
Dorset, England, United Kingdom Hybrid / WFH Options
Hays Specialist Recruitment Limited
infrastructure solutions across compute, storage, and networking Producing detailed technical documentation: hardware specs, data centre layouts, cabling, power and cooling Installing and tuning Linux-based operating systems and configuring SLURM job schedulers Optimising high-speed networking technologies (Infiniband, RoCE) Automating deployments and maintenance using Ansible, Terraform, Bash, and Python Troubleshooting complex distributed systems and mentoring junior engineers This is … building systems that scale, this role is for you. What you'll need to succeed Proven experience designing and scaling large HPC clusters (hundreds to thousands of nodes) Strong SLURM configuration skills - partitions, priorities, resource management Advanced Linux administration and performance tuning Expertise in high-performance networking (Infiniband, RoCE, RDMA) Experience with distributed file systems (Lustre, Ceph, WEKA, VAST More ❯
a related field2. Proven industry experience in building, deploying, and maintaining Linux servers (Red Hat/Rocky Linux)3. A working knowledge and practical experience with batch queuing systems (Slurm) and cloud computing, particularly AWSKey Words: Linux Systems Administrator/Scientific Computing/Red Hat/Rocky Linux/Slurm/AWS/Oracle DBA/IT Security More ❯