1 of 1 Permanent CheckPoint Jobs in Bristol

ML Infrastructure Engineer

Hiring Organisation
Wave Recruitment
Location
Greater Bristol Area, United Kingdom
tolerance and cost guardrails, scaling from single-GPU to multi-node distributed training Full provenance from dataset version and training configuration through to deployed checkpoint On-robot edge inference: optimised model export (ONNX, TensorRT), latency profiling, deployed-policy monitoring Staged rollout to the robot fleet with rollback capability What … policy learning workloads GPU orchestration: spot tolerance, cost control, multi-node scaling Experiment tracking and model registry with full provenance Mixed precision, FSDP, checkpoint management, cold-start reduction Data Pipelines Automated pipelines from raw robot demonstrations to training-ready datasets Data versioning so every model traces back to its source ...