on-call rotations to support high-priority incidents and escalations. About You Skills & Experience Proven experience supporting HPC and/or AI workloads in production environments. Strong expertise with Slurmworkloadmanager, including tuning and troubleshooting. Proficiency with system-level debugging, including kernel modules and network interfaces. Experience with GPU compute platforms (NVIDIA and/or AMD … settings. Comfort operating in fast-paced, ambiguous, high-growth environments. Nice to have Experience with OpenStack and troubleshooting infrastructure in cloud environments. Kubernetes expertise, particularly in HPC or AI workload contexts. Familiarity with distributed file systems and advanced storage configurations. Understanding of GPU virtualization and multi-tenant HPC architecture. Exposure to machine learning frameworks and AI optimization workflows. Scripting More ❯
and microbial genomics. Clear communicator, curious learner, and team oriented problem solver. Desirable Knowledge, Skills and Experience: Experience with cloud platforms (e.g. OCI, AWS, GCP) or HPC environments (e.g. Slurm). Familiarity with both long and shortread technologies (e.g. ONT, Illumina). Basic knowledge of metagenomics, antimicrobial resistance analytics or public health monitoring. Interest in data visualisation or machine More ❯
Chemistry and Biology Can communicate with ML engineers Demonstrates competence and rigor in software development. Has experience working with scientific computing/lab environments (e.g. has used or administered SLURM) Conversant with cloud computing; able to provide requirements to DevOps engineers ABOUT IAMBIC THERAPEUTICS Iambic is a clinical-stage life-science and technology company developing novel medicines using its More ❯
Engineer to work for a global FTSE 100 pharmasutical company. They are looking for someone with the following skillset: Must have skillsets: Azure Fundamentals. Posit Workbench, Connect and Package Manager Adminsitration. Expert in Linux Administration. Deep understanding and implementation experience with Slurm. Expert knowledge in managing Kubernetes. Good to have skills: Work experience in building HPC platform. Expert in More ❯
to work for a global FTSE 100 pharmasutical company. They are looking for someone with the following skill set: Must have skillsets: Azure Fundamentals. Posit Workbench, Connect and Package Manager Adminsitration. Expert in Linux Administration. Deep understanding and implementation experience with Slurm. Expert knowledge in managing Kubernetes. Good to have skills: Work experience in building HPC platform. Expert in More ❯
capacity partners, balancing deep engineering discussions with high-level business context. Qualifications: Technical depth in GPU-cloud infrastructure: Experience with large-scale GPU clusters using Kubernetes and/or SLURM over InfiniBand; deep understanding of the NVIDIA driver stack, NCCL performance tuning, and benchmarking. Strong customer or partner-facing experience: Able to bridge technical and business conversations, explain complex More ❯
capacity partners, balancing deep engineering discussions with high-level business context. Qualifications: Technical depth in GPU-cloud infrastructure: Experience with large-scale GPU clusters using Kubernetes and/or SLURM over InfiniBand; deep understanding of the NVIDIA driver stack, NCCL performance tuning, and benchmarking. Strong customer or partner-facing experience: Able to bridge technical and business conversations, explain complex More ❯
capacity partners, balancing deep engineering discussions with high-level business context. Qualifications: Technical depth in GPU-cloud infrastructure: Experience with large-scale GPU clusters using Kubernetes and/or SLURM over InfiniBand; deep understanding of the NVIDIA driver stack, NCCL performance tuning, and benchmarking. Strong customer or partner-facing experience: Able to bridge technical and business conversations, explain complex More ❯
the future of healthcare today. This company is on the hunt for HPC Engineers to power their 25 Petabyte system Sound good? Well there's more! Imagine working with Slurm clusters and GPFS storage, all while being an integral part of groundbreaking translational research. You will work in adynamic team of five, where your hands-on expertise will support More ❯
Hampshire, England, United Kingdom Hybrid / WFH Options
Hays Specialist Recruitment Limited
It's a great opportunity for someone who thrives in project-led infrastructure work and wants to help shape cutting-edge HPC solutions. What you'll need to succeed Slurm: Proven experience managing and tuning HPC job schedulers. Infiniband and RoCE: Deep knowledge of high-speed networking technologies. Ansible: Proficiency in using Ansible for automation and configuration management. Networking More ❯
if you have: Extremely strong software engineering skills. Proficiency in Python and related ML frameworks such as JAX, Pytorch and XLA/MLIR. Experience with distributed training infrastructures (Kubernetes, Slurm) and associated frameworks (Ray). Experience using large-scale distributed training strategies. Hands on experience on training large model at scale. Hands on experience with the post training phase More ❯
research engineer, you will play a pivotal role in managing and optimising a large-scale infrastructure. Your expertise in Linux systems, along with experience in High-Performance Computing (HPC), Slurmworkload management, and advanced storage solutions, will be essential to ensuring smooth and efficient operations. You'll be working alongside some of the brightest minds in research, directly More ❯
and apply today! Responsibilities: Design scalable and secure infrastructure across Azure, on-prem, and possibly other cloud platforms. Architect and guide the setup/configuration of HPC clusters (eg, SLURM) to support large-scale statistical workloads. Design and support environments for Python, R, and SAS that meet compliance, reproducibility, and performance standards. Implement security, access control, and compliance practices … in life sciences). Skills/Must have: Hands-on experience with Azure and hybrid cloud environments, including understanding of infrastructure architecture and deployment. Proficient in HPC systems like SLURM, including installation, configuration, and optimization for performance-heavy workloads. Strong Python knowledge with experience in installation, configuration, and Scripting in Linux environments. Experience working with R and Python environments More ❯
a Lead HPC Engineer, you'll be at the forefront of designing, optimising, and managing advanced computational infrastructure. You'll have a solid grasp of all things HPC, Linux, Slurm, and storage systems (bonus points if you're familiar with GPFS). Your expertise will ensure the systems are reliable, scalable, and high-performing, ready to support researchers in … about emerging technologies will be key to keeping our infrastructure at the forefront of innovation. We're looking for someone with deep expertise in HPC environments, including: Linux systems, workload management, parallel storage, and high-speed networking. You'll also bring strong leadership skills, inspiring and managing teams, while rolling up your sleeves to tackle technical challenges. Clear communication More ❯