the future of healthcare today. This company is on the hunt for HPC Engineers to power their 25 Petabyte system Sound good? Well there's more! Imagine working with Slurm clusters and GPFS storage, all while being an integral part of groundbreaking translational research. You will work in adynamic team of five, where your hands-on expertise will support More ❯
if you have: Extremely strong software engineering skills. Proficiency in Python and related ML frameworks such as JAX, Pytorch and XLA/MLIR. Experience with distributed training infrastructures (Kubernetes, Slurm) and associated frameworks (Ray). Experience using large-scale distributed training strategies. Hands on experience on training large model at scale and having contributed to the tooling and/ More ❯
if you have: Extremely strong software engineering skills. Proficiency in Python and related ML frameworks such as JAX, Pytorch and XLA/MLIR. Experience with distributed training infrastructures (Kubernetes, Slurm) and associated frameworks (Ray). Experience using large-scale distributed training strategies. Hands on experience on training large model at scale. Hands on experience with the post training phase More ❯
on experience benchmarking and optimizing performance of models on accelerated computing (GPU, TPU, AI ASICs) clusters with high-speed networking. - Experience scaling model training and inference using technologies like Slurm, ParallelCluster, Amazon SageMaker. - Experience in developing and deploying large scale machine learning or deep learning models and/or systems into production, including batch and real-time data processing. More ❯
on experience benchmarking and optimizing performance of models on accelerated computing (GPU, TPU, AI ASICs) clusters with high-speed networking. - Experience scaling model training and inference using technologies like Slurm, ParallelCluster, Amazon SageMaker. - Experience in developing and deploying large scale machine learning or deep learning models and/or systems into production, including batch and real-time data processing. More ❯
research engineer, you will play a pivotal role in managing and optimising a large-scale infrastructure. Your expertise in Linux systems, along with experience in High-Performance Computing (HPC), Slurmworkload management, and advanced storage solutions, will be essential to ensuring smooth and efficient operations. You'll be working alongside some of the brightest minds in research, directly More ❯
a Lead HPC Engineer, you'll be at the forefront of designing, optimising, and managing advanced computational infrastructure. You'll have a solid grasp of all things HPC, Linux, Slurm, and storage systems (bonus points if you're familiar with GPFS). Your expertise will ensure the systems are reliable, scalable, and high-performing, ready to support researchers in … about emerging technologies will be key to keeping our infrastructure at the forefront of innovation. We're looking for someone with deep expertise in HPC environments, including: Linux systems, workload management, parallel storage, and high-speed networking. You'll also bring strong leadership skills, inspiring and managing teams, while rolling up your sleeves to tackle technical challenges. Clear communication More ❯
mission to transform the AI landscape. Responsibilities Design, implement, and maintain our ML-focused cloud infrastructure on GCP using Infrastructure as Code (Terraform) Build and manage HPC clusters with Slurm for distributed ML workloads, focusing on GPU/TPU utilization and job scheduling Develop and maintain ML pipeline automation tools and ML-specific CI/CD workflows in Python … security and data protection Requirements 3+ years of experience in ML infrastructure or ML platform engineering Strong proficiency in Python for ML pipeline automation and tooling Extensive experience with Slurm cluster management for large-scale ML workloads Proven track record with Terraform and Infrastructure as Code for ML environments Solid understanding of GCP's ML-specific services (Vertex AI More ❯
Innovate whilst bringing in new ideas to the business Skills/Experience Systems Engineering experience in a high-availability & low-latency environment Knowledge of HPC cluster schedulers, such as Slurm, Grid engine, MOAB, PBS Strong experience with scripting/automation is highly preferred (Python, Ansible, Chef, Puppet) Exposure to CPU Chipsets is a plus Experience working with FPGAs is More ❯
Innovate whilst bringing in new ideas to the business Skills/Experience Systems Engineering experience in a high-availability & low-latency environment Knowledge of HPC cluster schedulers, such as Slurm, Grid engine, MOAB, PBS Strong experience with scripting/automation is highly preferred (Python, Ansible, Chef, Puppet) Exposure to CPU Chipsets is a plus Experience working with FPGAs is More ❯
Innovate whilst bringing in new ideas to the business Skills/Experience Systems Engineering experience in a high-availability & low-latency environment Knowledge of HPC cluster schedulers, such as Slurm, Grid engine, MOAB, PBS Strong experience with scripting/automation is highly preferred (Python, Ansible, Chef, Puppet) Exposure to CPU Chipsets is a plus Experience working with FPGAs is More ❯
Innovate whilst bringing in new ideas to the business Skills/Experience Systems Engineering experience in a high-availability & low-latency environment Knowledge of HPC cluster schedulers, such as Slurm, Grid engine, MOAB, PBS Strong experience with scripting/automation is highly preferred (Python, Ansible, Chef, Puppet) Exposure to CPU Chipsets is a plus Experience working with FPGAs is More ❯
Innovate whilst bringing in new ideas to the business Skills/Experience Systems Engineering experience in a high-availability & low-latency environment Knowledge of HPC cluster schedulers, such as Slurm, Grid engine, MOAB, PBS Strong experience with scripting/automation is highly preferred (Python, Ansible, Chef, Puppet) Exposure to CPU Chipsets is a plus Experience working with FPGAs is More ❯
City of London, London, United Kingdom Hybrid / WFH Options
Hunter Bond
identifying low-level bottlenecks. • Strong scripting/development ability in 1 or more languages e.g. python/Rust • Knowledge of network fundamentals • Full understanding of Network protocols • Experience with Slurm Benefits: • Cutting-edge tech: 5/10 years ahead of the competition • Rewarding, satisfying work: yearly pay reviews, role progression and autonomy • Greenfield projects • Unrivalled bonus, exceptional benefits package More ❯
identifying low-level bottlenecks. • Strong scripting/development ability in 1 or more languages e.g. python/Rust • Knowledge of network fundamentals • Full understanding of Network protocols • Experience with Slurm Benefits: • Cutting-edge tech: 5/10 years ahead of the competition • Rewarding, satisfying work: yearly pay reviews, role progression and autonomy • Greenfield projects • Unrivalled bonus, exceptional benefits package More ❯
london (city of london), south east england, united kingdom
Hunter Bond
Innovate whilst bringing in new ideas to the business Skills/Experience Systems Engineering experience in a high-availability & low-latency environment Knowledge of HPC cluster schedulers, such as Slurm, Grid engine, MOAB, PBS Strong experience with scripting/automation is highly preferred (Python, Ansible, Chef, Puppet) Exposure to CPU Chipsets is a plus Experience working with FPGAs is More ❯
South East London, England, United Kingdom Hybrid / WFH Options
Hunter Bond
identifying low-level bottlenecks. • Strong scripting/development ability in 1 or more languages e.g. python/Rust • Knowledge of network fundamentals • Full understanding of Network protocols • Experience with Slurm Benefits: • Cutting-edge tech: 5/10 years ahead of the competition • Rewarding, satisfying work: yearly pay reviews, role progression and autonomy • Greenfield projects • Unrivalled bonus, exceptional benefits package More ❯
london, south east england, united kingdom Hybrid / WFH Options
Hunter Bond
identifying low-level bottlenecks. • Strong scripting/development ability in 1 or more languages e.g. python/Rust • Knowledge of network fundamentals • Full understanding of Network protocols • Experience with Slurm Benefits: • Cutting-edge tech: 5/10 years ahead of the competition • Rewarding, satisfying work: yearly pay reviews, role progression and autonomy • Greenfield projects • Unrivalled bonus, exceptional benefits package More ❯
london (city of london), south east england, united kingdom Hybrid / WFH Options
Hunter Bond
identifying low-level bottlenecks. • Strong scripting/development ability in 1 or more languages e.g. python/Rust • Knowledge of network fundamentals • Full understanding of Network protocols • Experience with Slurm Benefits: • Cutting-edge tech: 5/10 years ahead of the competition • Rewarding, satisfying work: yearly pay reviews, role progression and autonomy • Greenfield projects • Unrivalled bonus, exceptional benefits package More ❯
slough, south east england, united kingdom Hybrid / WFH Options
Hunter Bond
identifying low-level bottlenecks. • Strong scripting/development ability in 1 or more languages e.g. python/Rust • Knowledge of network fundamentals • Full understanding of Network protocols • Experience with Slurm Benefits: • Cutting-edge tech: 5/10 years ahead of the competition • Rewarding, satisfying work: yearly pay reviews, role progression and autonomy • Greenfield projects • Unrivalled bonus, exceptional benefits package More ❯
practical experience in performance tuning and resource fencing Linux tuning with experience around high throughput or high performance computing Bash, Shell or Python Salt, Chef or Ansible HPC Architecture Slurm or Grid engine orMOAB or PBS Containers and container orchestration You will be joining a progressive and exciting company committed to excellence. They offer an excellent working environment with More ❯
practical experience in performance tuning and resource fencing Linux tuning with experience around high throughput or high performance computing Bash, Shell or Python Salt, Chef or Ansible HPC Architecture Slurm or Grid engine orMOAB or PBS Containers and container orchestration You will be joining a progressive and exciting company committed to excellence. They offer an excellent working environment with More ❯
practical experience in performance tuning and resource fencing Linux tuning with experience around high throughput or high performance computing Bash, Shell or Python Salt, Chef or Ansible HPC Architecture Slurm or Grid engine orMOAB or PBS Containers and container orchestration You will be joining a progressive and exciting company committed to excellence. They offer an excellent working environment with More ❯
practical experience in performance tuning and resource fencing Linux tuning with experience around high throughput or high performance computing Bash, Shell or Python Salt, Chef or Ansible HPC Architecture Slurm or Grid engine orMOAB or PBS Containers and container orchestration You will be joining a progressive and exciting company committed to excellence. They offer an excellent working environment with More ❯
practical experience in performance tuning and resource fencing Linux tuning with experience around high throughput or high performance computing Bash, Shell or Python Salt, Chef or Ansible HPC Architecture Slurm or Grid engine orMOAB or PBS Containers and container orchestration You will be joining a progressive and exciting company committed to excellence. They offer an excellent working environment with More ❯