ML Engineer

About The Company

We’re an AI/ML tech start-up developing a novel foundation model with a singular vision: to achieve fully automated, unsupervised software delivery in embedded control systems. We’re based in West London, backed by venture capital, and scaling our team.

About The Role

Optimize and own performance of AI/ML foundation model, design GPU components, reduce latency, and work with founders on optimization goals. Requires CUDA, Python, and deep learning expertise.

  • Own the performance, scalability, and reliability of the company's foundation model in both training and inference.
  • Profile and optimize the end-to-end ML stack: data pipelines, training loops, inference serving, and deployment.
  • Design and implement GPU-accelerated components, including custom CUDA kernels where off-the-shelf libraries are insufficient.
  • Reduce latency and cost per inference token while maximizing throughput and hardware utilization.
  • Work closely with the founders to translate product requirements into concrete optimization goals and technical roadmaps.
  • Build internal tooling, benchmarks, and evaluation harnesses to help the team experiment, debug, and ship safely.
  • Contribute to model architecture and system design where it impacts performance and robustness.

Requirements

  • Significant hands-on experience optimizing deep learning models
  • Proven ability to profile and debug performance bottlenecks
  • Experience with distributed or large-scale training and inference
  • Familiarity with techniques such as mixed precision, quantization, distillation, pruning, caching, and batching
  • Experience with large models (e.g., transformers)
  • Practical CUDA development experience
  • Deep understanding of at least one major deep learning framework (ideally PyTorch)
  • Experience building and operating ML systems on cloud platforms (AWS, Azure, or GCP)
  • Comfort working with experiment tracking, monitoring, and evaluation pipelines

Required Skills

  • CUDA C/C++ programming
  • Python
  • deep learning model optimization
  • profiling and debugging performance bottlenecks
  • distributed or large-scale training and inference
  • mixed precision
  • quantization
  • distillation
  • pruning
  • caching
  • batching
  • transformers
  • CUDA development
  • deep learning frameworks (ideally PyTorch)
  • cloud platforms (AWS
  • Azure
  • GCP)
  • containerization
  • orchestration
  • experiment tracking
  • monitoring
  • evaluation pipelines
  • passion and determination
  • ownership mentality
  • problem-solving
  • delivery-oriented
  • openness to disagreement

Salary: 100000 - 150000 GBP

Equity: share options

Job Details

Company
TechTree
Location
London, UK
Posted