Senior DevOps Engineer

Humanoid is the first AI and robotics company in the UK, creating the world’s most advanced, reliable, commercially scalable, and safe humanoid robots. Our first humanoid robot HMND 01 is a next-gen labour automation unit, providing highly efficient services across various use cases, starting with industrial applications.

Our Mission

At Humanoid we strive to create the world’s leading, commercially scalable, safe, and advanced humanoid robots that seamlessly integrate into daily life and amplify human capacity.

We are building large-scale compute infrastructure for training next-generation robotics models, including transformer-based systems like VLA.

This role focuses on designing and operating multi-GPU, cross-cloud platforms that enable efficient, reliable, and scalable model training. You’ll work at the intersection of DevOps, MLOps, and distributed systems, helping push the limits of real-world AI.

What You’ll Do:

  • Design, build, and operate scalable multi-GPU infrastructure across cloud environments (AWS, GCP, etc.)
  • Own the reliability, performance, and cost-efficiency of model training platforms
  • Develop and maintain infrastructure-as-code and automation for provisioning, orchestration, and lifecycle management
  • Build and evolve CI/CD pipelines for both infrastructure and ML training workflows
  • Optimize distributed training workloads (scheduling, resource utilization, observability)
  • Ensure high standards of reliability, scalability, security, and monitoring across systems
  • Collaborate with ML engineers and researchers to enable efficient experimentation and productionization
  • Troubleshoot complex issues across distributed systems, networking, and GPU workloads
  • Define and implement best practices in DevOps/MLOps for a fast-scaling environment
  • Document systems, architecture decisions, and operational processes

We’re Looking For:

  • 5+ years of experience in DevOps, MLOps, or infrastructure engineering (Senior/Staff level)
  • Strong experience with Kubernetes and containerized workloads at scale
  • Proven experience with Infrastructure-as-Code (Terraform, Helm, or similar)
  • Deep familiarity with at least one major cloud provider (AWS preferred)
  • Solid experience building CI/CD systems (e.g., GitHub Actions, GitLab CI, ArgoCD)
  • Proficiency in Python for automation and tooling
  • Strong understanding of distributed systems, networking, and system reliability
  • Ability to operate independently and drive large infrastructure initiatives

Nice to have:

  • Hands-on experience with multi-GPU and/or distributed compute environments
  • Experience with GPU scheduling/orchestration (e.g., Kubernetes schedulers - Volcano, Ray, etc.)
  • Experience supporting ML workloads or training pipelines (PyTorch, TensorFlow, etc.)
  • Experience with multi-cloud or hybrid cloud environments
  • Background in performance optimization for training workloads
  • Experience in robotics, simulation, or embodied AI systems

What we offer:

  • Competitive salary plus participation in our Stock Option Plan
  • Paid vacation with adjustments based on your location to comply with local labor laws
  • Travel opportunities to our Vancouver and Boston offices
  • Office perks: free breakfasts, lunches, snacks, and regular team events
  • Freedom to influence the product and own key initiatives
  • Collaboration with top‐tier engineers, researchers, and product experts in AI and robotics
  • Startup culture prioritising speed, transparency, and minimal bureaucracy

Job Details

Company
Humanoid
Location
London Area, United Kingdom
Posted