Cambridge, Cambridgeshire, United Kingdom Hybrid / WFH Options
Proactive Appointments
environments, such as HPC, HTC or BC Drive innovative computational solutions and exploit emerging technologies Experience of administration of large-scale cluster and server computing and related Software (eg Slurm, LSF, Grid Engine) Hands-on experience working in a DevOps team and using agile methodologies Operating and consuming virtualised private cloud resources (eg OpenStack) Understanding of Linux system administration More ❯
Hampshire, England, United Kingdom Hybrid / WFH Options
Hays Specialist Recruitment Limited
It's a great opportunity for someone who thrives in project-led infrastructure work and wants to help shape cutting-edge HPC solutions. What you'll need to succeed Slurm: Proven experience managing and tuning HPC job schedulers. Infiniband and RoCE: Deep knowledge of high-speed networking technologies. Ansible: Proficiency in using Ansible for automation and configuration management. Networking More ❯
be responsible for creating detailed technical designs, including hardware specifications, data centre layouts, cabling, and power/cooling requirements. You'll install and tune Linux-based operating systems, configure SLURM job schedulers, and optimise high-speed networking technologies such as Infiniband and RoCE. The role also involves scripting and automation (Ansible, Terraform), troubleshooting complex distributed systems, and mentoring junior … to succeed To be successful in this role, you'll bring: HPC Cluster Expertise: Proven experience designing, deploying, and scaling large HPC environments (hundreds to thousands of nodes). SLURM Scheduler Configuration: Deep understanding of SLURM partitions, priorities, and resource management. Networking: Strong knowledge of high-performance networking (Infiniband, RoCE, RDMA) and troubleshooting interconnectivity issues. Linux Systems: Advanced More ❯
Edinburgh, Midlothian, United Kingdom Hybrid / WFH Options
Lenovo
transformer architectures, and related concepts. Experience with data processing tools and techniques (e.g., Pandas, NumPy). Experience working with Linux systems and/or HPC cluster job scheduling (e.g., Slurm, PBS). Excellent communication, collaboration, and problem solving skills. Bonus Points Ph.D. in Computer Science, Machine Learning, or a related field. Experience with distributed training frameworks (e.g., DeepSpeed, Megatron More ❯
Dorset, England, United Kingdom Hybrid / WFH Options
Hays Specialist Recruitment Limited
infrastructure solutions across compute, storage, and networking Producing detailed technical documentation: hardware specs, data centre layouts, cabling, power and cooling Installing and tuning Linux-based operating systems and configuring SLURM job schedulers Optimising high-speed networking technologies (Infiniband, RoCE) Automating deployments and maintenance using Ansible, Terraform, Bash, and Python Troubleshooting complex distributed systems and mentoring junior engineers This is … building systems that scale, this role is for you. What you'll need to succeed Proven experience designing and scaling large HPC clusters (hundreds to thousands of nodes) Strong SLURM configuration skills - partitions, priorities, resource management Advanced Linux administration and performance tuning Expertise in high-performance networking (Infiniband, RoCE, RDMA) Experience with distributed file systems (Lustre, Ceph, WEKA, VAST More ❯
involves designing, managing, and optimising HPC clusters. The successful candidate will work flexibly and collaborate with security-cleared teams. Responsibilities: Manage and maintain HPC clusters, monitoring performance (e.g., Ganglia, Slurm) and troubleshooting hardware/software issues for 24/7 uptime. Optimise job scheduling (e.g., Slurm, Grid Engine, IBM) and tune MPI-based applications for genomic and health More ❯