AI Platform Engineer
Platform Engineer
Openings X3
Location: London (Hybrid)
Employment Type: Full-time
The Opportunity
We are building a next-generation computational platform that powers large-scale machine learning, data science, and scientific discovery. Our teams work at the intersection of cloud infrastructure, high-performance computing, and data engineering, enabling researchers and ML practitioners to move faster—from experimentation to real-world impact.
This role sits at the heart of the platform: designing, scaling, and operating systems that support GPU-accelerated workloads, batch pipelines, and data-intensive applications.
Who This Role Is For (Choose Your Strength)
We're open to different profiles and will shape the role around your strengths:
AI Platform / ML Infrastructure Engineers
- Kubernetes-based compute platforms
- GPU scheduling, batch & distributed workloads
- Supporting ML training, inference, and experimentation at scale
HPC / GPU Engineers
- Job schedulers, MPI, multi-node workloads
- Hybrid cloud and on-prem compute
- Performance, reliability, and cost optimisation
Strong Data Engineers
- Large-scale data pipelines and data platforms
- Data reliability, orchestration, and observability
- Close collaboration with ML and research teams
What You'll Work On
- Designing and evolving Kubernetes-based compute platforms across hybrid and multi-cloud environments
- Building and operating GPU-enabled infrastructure for ML and scientific workloads
- Developing and maintaining core platform services, APIs, and internal tooling
- Improving CI/CD pipelines and Infrastructure-as-Code workflows
- Implementing monitoring, alerting, and reliability engineering practices
- Ensuring security, data protection, backup, and disaster recovery best practices
- Partnering closely with ML engineers, data scientists, and researchers to unblock compute and data challenges
What We're Looking For
- Strong experience in one or more of:
- Platform / infrastructure engineering
- ML infrastructure or MLOps
- HPC or GPU compute
- Data engineering at scale
- Solid experience with Linux and cloud environments
- Hands-on work with Kubernetes or distributed systems
- Experience with Python (or similar) for automation or services
- Familiarity with CI/CD, Git-based workflows, and automation
- Strong problem-solving skills and a collaborative mindset
Bonus
- Terraform or other IaC tools
- Slurm, Kueue, Ray, Spark, or similar systems
- GPU tooling (CUDA, Nvidia operators, schedulers)
- Experience supporting ML training or data science teams