Senior DevOps Engineer

Humanoid is the first AI and robotics company in the UK, creating the world’s most advanced, reliable, commercially scalable, and safe humanoid robots. Our first humanoid robot HMND 01 is a next-gen labour automation unit, providing highly efficient services across various use cases, starting with industrial applications.

Our Mission

At Humanoid we strive to create the world’s leading, commercially scalable, safe, and advanced humanoid robots that seamlessly integrate into daily life and amplify human capacity.

We are building large-scale compute infrastructure for training next-generation robotics models, including transformer-based systems like VLA.

This role focuses on designing and operating multi-GPU, cross-cloud platforms that enable efficient, reliable, and scalable model training. You’ll work at the intersection of DevOps, MLOps, and distributed systems, helping push the limits of real-world AI.

What You’ll Do:

Design, build, and operate scalable multi-GPU infrastructure across cloud environments (AWS, GCP, etc.)
Own the reliability, performance, and cost-efficiency of model training platforms
Develop and maintain infrastructure-as-code and automation for provisioning, orchestration, and lifecycle management
Build and evolve CI/CD pipelines for both infrastructure and ML training workflows
Optimize distributed training workloads (scheduling, resource utilization, observability)
Ensure high standards of reliability, scalability, security, and monitoring across systems
Collaborate with ML engineers and researchers to enable efficient experimentation and productionization
Troubleshoot complex issues across distributed systems, networking, and GPU workloads
Define and implement best practices in DevOps/MLOps for a fast-scaling environment
Document systems, architecture decisions, and operational processes

We’re Looking For:

5+ years of experience in DevOps, MLOps, or infrastructure engineering (Senior/Staff level)
Strong experience with Kubernetes and containerized workloads at scale
Proven experience with Infrastructure-as-Code (Terraform, Helm, or similar)
Deep familiarity with at least one major cloud provider (AWS preferred)
Solid experience building CI/CD systems (e.g., GitHub Actions, GitLab CI, ArgoCD)
Proficiency in Python for automation and tooling
Strong understanding of distributed systems, networking, and system reliability
Ability to operate independently and drive large infrastructure initiatives

Nice to have:

Hands-on experience with multi-GPU and/or distributed compute environments
Experience with GPU scheduling/orchestration (e.g., Kubernetes schedulers - Volcano, Ray, etc.)
Experience supporting ML workloads or training pipelines (PyTorch, TensorFlow, etc.)
Experience with multi-cloud or hybrid cloud environments
Background in performance optimization for training workloads
Experience in robotics, simulation, or embodied AI systems

What we offer:

Competitive salary plus participation in our Stock Option Plan
Paid vacation with adjustments based on your location to comply with local labor laws
Travel opportunities to our Vancouver and Boston offices
Office perks: free breakfasts, lunches, snacks, and regular team events
Freedom to influence the product and own key initiatives
Collaboration with top‐tier engineers, researchers, and product experts in AI and robotics
Startup culture prioritising speed, transparency, and minimal bureaucracy

Apply Now

Senior DevOps Engineer

Job Details