Saffron Walden, Essex, South East, United Kingdom Hybrid / WFH Options
EMBL-EBI
or more modalities Experience developing or integrating image visualisation systems Experience with NoSQL databases, such as MongoDB Experience with batch scheduling systems such as SLURM Experience with containerisation (e.g. Docker) and container orchestration (e.g. Kubernetes) Infrastructure-as-code deployment tools such as Ansible or Terraform Experience working in an more »
Ethernet), processors (Intel/AMD/ARM/NVIDIA), parallel file systems, and data center infrastructure. Additional skills in MPI, parallel job scheduling (e.g., SLURM), and management & monitoring tools (e.g., Icinga, Prometheus, Grafana) are advantageous. Requirements: Eligible and willing to undergo UK Govt. security clearance. Proven experience as a more »
with key stakeholders for enterprise customers. Technical Experience High Performance Computers – (Supporting Users) Configuration, and management of HPC Infrastructure Linux MPI InfiniBand Job schedulers SLURM Contract Details: PAYE Contract - Competitive Rate 18 Months Contract Remote - UK Based Including Training and Upskilling It’s an amazing opportunity to be a more »
magnitude of training runs Explore novel synthetic data generation techniques Engineer robust, high-performance inference Experience Technical: Have experience operating orchestration systems such as SLURM, Ray, or similar. Experience in creating and managing multi-instance clusters for data and model parallel training across GPUs/TPUs, preferably using PyTorch more »
of your team, ideally for a l eading AI research laboratory, or a pioneering AI business Key Requirements: Python and PyTorch expertise Experience in SLURM, Ray, or similar Graphics Processing Units (GPUs) Experience in creating and managing HPC clusters for ML models Experience in efficiently serving large ML models more »
high-performance inference platforms Collaborate in defining and steering their evolving inference and training stack Experience Technical: Have experience operating orchestration systems such as SLURM, Ray, or similar. Experience in creating and managing multi-instance clusters for data and model parallel training across GPUs/TPUs, preferably using PyTorch more »
with Kubernetes Familiar with inference servers such as multi-LoRa, LoRA Exchange, TitanML etc Experience creating/managing multi node HPC clusters Experience with Workload Managers like SLURM, Kepler, Moab etc Experience working with some of the more recent LLMs (OpenAI, Mistral, Claude, LLaMA etc) Whats in it more »