Staff DevOps Engineer

Humanoid is the first AI and robotics company in the UK, creating the world’s most advanced, reliable, commercially scalable, and safe humanoid robots. Our first humanoid robot HMND 01 is a next-gen labour automation unit, providing highly efficient services across various use cases, starting with industrial applications.

Our Mission

At Humanoid we strive to create the world’s leading, commercially scalable, safe, and advanced humanoid robots that seamlessly integrate into daily life and amplify human capacity.

We are building large-scale compute infrastructure to train next-generation robotics models, including transformer-based systems like VLA.

As a Staff Engineer, you will lead the design and evolution of our multi-GPU, cross-cloud platforms, driving architecture, reliability, and performance at scale. This role sits at the intersection of DevOps, MLOps, and distributed systems, enabling cutting-edge AI in real-world environments.

What You’ll Do:

Lead the design and evolution of scalable multi-GPU infrastructure across cloud environments (AWS, GCP, etc.)
Own architecture and long-term technical direction of model training platforms
Drive reliability, performance, and cost-efficiency at scale
Define and implement best practices for infrastructure, DevOps, and MLOps across the organization
Build and evolve infrastructure-as-code and automation for provisioning, orchestration, and lifecycle management
Architect and improve CI/CD systems for both infrastructure and ML training workflows
Optimize distributed training workloads (scheduling, resource utilization, observability)
Partner with ML engineers and researchers to enable efficient experimentation and productionization
Lead troubleshooting and resolution of complex system issues across distributed, GPU-heavy environments
Mentor engineers and raise the bar for engineering quality and operational excellence
Document architecture, systems, and key technical decisions

We’re Looking For:

7+ years of experience in DevOps, MLOps, or infrastructure engineering (Staff level)
Proven experience designing and operating multi-GPU / distributed compute infrastructure
Experience with GPU scheduling/orchestration (e.g., Kubernetes schedulers, Volcano, Ray, etc.)
Strong experience with Kubernetes and containerized workloads at scale
Deep expertise in Infrastructure-as-Code (Terraform, Helm, or similar)
Deep familiarity with at least one major cloud provider (AWS preferred)
Strong experience building and scaling CI/CD systems (e.g., GitHub Actions, GitLab CI, ArgoCD)
Proficiency in Python for automation and tooling
Strong understanding of distributed systems, networking, and system reliability
Demonstrated ability to lead large technical initiatives and influence system design
Experience supporting ML workloads or training pipelines (PyTorch, TensorFlow, etc.)

Nice to have:

Experience with multi-cloud or hybrid cloud environments
Background in performance optimization for large-scale training workloads
Experience in robotics, simulation, or embodied AI systems

What we offer:

Competitive salary plus participation in our Stock Option Plan
Paid vacation with adjustments based on your location to comply with local labor laws
Travel opportunities to our Vancouver and Boston offices
Office perks: free breakfasts, lunches, snacks, and regular team events
Freedom to influence the product and own key initiatives
Collaboration with top‐tier engineers, researchers, and product experts in AI and robotics
Startup culture prioritising speed, transparency, and minimal bureaucracy

Apply Now

Staff DevOps Engineer

Job Details