ML Engineer

About The Company
We’re an AI/ML tech start-up developing a novel foundation model with a singular vision: to achieve fully automated, unsupervised software delivery in embedded control systems. We’re based in West London, backed by venture capital, and scaling our team.
About The Role
Optimize and own performance of AI/ML foundation model, design GPU components, reduce latency, and work with founders on optimization goals. Requires CUDA, Python, and deep learning expertise.
Own the performance, scalability, and reliability of the company's foundation model in both training and inference.
Profile and optimize the end-to-end ML stack: data pipelines, training loops, inference serving, and deployment.
Design and implement GPU-accelerated components, including custom CUDA kernels where off-the-shelf libraries are insufficient.
Reduce latency and cost per inference token while maximizing throughput and hardware utilization.
Work closely with the founders to translate product requirements into concrete optimization goals and technical roadmaps.
Build internal tooling, benchmarks, and evaluation harnesses to help the team experiment, debug, and ship safely.
Contribute to model architecture and system design where it impacts performance and robustness.
Requirements
Significant hands-on experience optimizing deep learning models
Proven ability to profile and debug performance bottlenecks
Experience with distributed or large-scale training and inference
Familiarity with techniques such as mixed precision, quantization, distillation, pruning, caching, and batching
Experience with large models (e.g., transformers)
Practical CUDA development experience
Deep understanding of at least one major deep learning framework (ideally PyTorch)
Experience building and operating ML systems on cloud platforms (AWS, Azure, or GCP)
Comfort working with experiment tracking, monitoring, and evaluation pipelines
Required Skills
CUDA C/C++ programming
Python
deep learning model optimization
profiling and debugging performance bottlenecks
distributed or large-scale training and inference
mixed precision
quantization
distillation
pruning
caching
batching
transformers
CUDA development
deep learning frameworks (ideally PyTorch)
cloud platforms (AWS
Azure
GCP)
containerization
orchestration
experiment tracking
monitoring
evaluation pipelines
passion and determination
ownership mentality
problem-solving
delivery-oriented
openness to disagreement
Salary: 100000 - 150000 GBP
Equity: share options

Apply Now

ML Engineer

Job Details