4 of 4 Slurm Workload Manager Jobs in London

Senior Engineering Lead, Chem-Bio

Hiring Organisation
Jobleads-UK
Location
Greater London, England, United Kingdom
record of leading technical work in a team. Strong infrastructure and platform skills — experience with cloud environments (AWS), container orchestration (Kubernetes), and job scheduling (Slurm or similar). Demonstrated experience leading or managing engineers — whether through formal line management, tech-leading a team, or running hiring pipelines. ...

Research Engineer, Pre-Training

Hiring Organisation
Jobleads-UK
Location
Greater London, England, United Kingdom
Background in numerical computing, HPC, or distributed systems, including familiarity with GPUs/TPUs, high-performance networking (NVLink/InfiniBand), Kubernetes/Slurm, and OS internals Expertise in Python and deep experience with modern deep learning frameworks (PyTorch and/or JAX) Advanced degree (MS or PhD) in Computer ...

Research Engineer, Machine Learning – Paris/London/Zurich/Warsaw

Hiring Organisation
Jobleads-UK
Location
Greater London, England, United Kingdom
+ years working on large‐scale ML codebases. Hands‐on with PyTorch, JAX or TensorFlow; comfortable with distributed training (DeepSpeed/FSDP/SLURM/K8s). Experience in deep learning, NLP or LLMs; bonus for CUDA or data‐pipeline chops. Strong software‐design instincts: testing, code review ...

Senior Staff+ Software Engineer (Kubernetes Platform)

Hiring Organisation
Jobleads-UK
Location
Greater London, England, United Kingdom
controllers — so it stays responsive as object counts and node counts grow by orders of magnitude. And we build the core cluster services every workload depends on, like service discovery, so they hold up under the same pressure. We make sure the control plane is fast, correct, and always … accelerator fleets, including custom scheduling plugins and policies for gang scheduling, topology awareness, and preemption Scale the Kubernetes control plane (apiserver, etcd, controller-manager) to support clusters far beyond typical limits, and find the next bottleneck before it finds us Design, build, and operate core cluster services such ...