Software Engineer
🚀 GPU Infrastructure / Performance Engineer
📍 London (Onsite) | 🌍 Visa Sponsorship + Relocation
Join a frontier AI company backed by NVIDIA, building large-scale open-weight foundation models alongside researchers and engineers from DeepMind, OpenAI, Meta, Anthropic, and Google Brain.
⚡ What You’ll Do
- Optimise GPU performance and training efficiency across 1,000+ GPU clusters
- Improve utilisation, throughput, and reliability across distributed training infrastructure
- Build tooling for orchestration, monitoring, scheduling, and observability
- Work closely with research teams to accelerate large-scale model training
🔧 What They’re Looking For
- Deep GPU infrastructure / distributed systems experience
- Strong knowledge of CUDA, NCCL, PyTorch, DeepSpeed, JAX, Megatron-LM, vLLM, etc.
- Experience operating large-scale GPU clusters (1,000+ GPUs)
- Kubernetes, Slurm, or similar orchestration expertise
BONUS: Experience working on NVIDIA Blackwell chips (B200, B300, GB200, GB300)
💰 Package
- Salary open to candidate expectations
- Meaningful startup equity
- Full visa sponsorship + relocation support