Saffron Walden, Essex, South East, United Kingdom Hybrid / WFH Options
EMBL-EBI
or more modalities Experience developing or integrating image visualisation systems Experience with NoSQL databases, such as MongoDB Experience with batch scheduling systems such as SLURM Experience with containerisation (e.g. Docker) and container orchestration (e.g. Kubernetes) Infrastructure-as-code deployment tools such as Ansible or Terraform Experience working in an more »
systems, CI/CD, etc.) Attention to detail needed to manage and debug production services. Experience with research clusters and implementing tools such as Slurmworkload manager. Job Duties Own the lifecycle of our Linux-based servers and applications across our multiple business environments. Automate and troubleshoot a more »
magnitude of training runs Explore novel synthetic data generation techniques Engineer robust, high-performance inference Experience Technical: Have experience operating orchestration systems such as SLURM, Ray, or similar. Experience in creating and managing multi-instance clusters for data and model parallel training across GPUs/TPUs, preferably using PyTorch more »
of your team, ideally for a l eading AI research laboratory, or a pioneering AI business Key Requirements: Python and PyTorch expertise Experience in SLURM, Ray, or similar Graphics Processing Units (GPUs) Experience in creating and managing HPC clusters for ML models Experience in efficiently serving large ML models more »
high-performance inference platforms Collaborate in defining and steering their evolving inference and training stack Experience Technical: Have experience operating orchestration systems such as SLURM, Ray, or similar. Experience in creating and managing multi-instance clusters for data and model parallel training across GPUs/TPUs, preferably using PyTorch more »
SLES EnterpriseMandatory technical skills:Linux administrationSuch as: SuSE or RedHat - any modern Linux distribution admin experience will be considered.Cluster management solutionsSuch as: Bright Cluster manager, PXE booting, OpenHPC, Warewulf or RocksBeneficial technical skills:Experience using or managing HPC clustersSuch as: Beowulf, OpenStack or HadoopExperience managing batch scheduling systemsSuch as … PBS Pro, Slurm, SGE/UGE, Microsoft Scheduler Experience with scientific or engineering applicationsSuch as LSDyna, Altair Hyperworks, AbaqusScripting skillsPrimarily Bash, but any shell scripting along with Python and Perl.Beneficial 'Soft' skills:Good problem-solving skillsStrong stakeholder management skillsStrong communication skillsPlease be aware that you will be joining the more »
with Kubernetes Familiar with inference servers such as multi-LoRa, LoRA Exchange, TitanML etc Experience creating/managing multi node HPC clusters Experience with Workload Managers like SLURM, Kepler, Moab etc Experience working with some of the more recent LLMs (OpenAI, Mistral, Claude, LLaMA etc) Whats in it more »