Machine Learning Engineer

ML Systems Engineer

Are you interested in helping us craft exceptional experiences for our clients that deliver genuine social impact? Are you ready to join a small and experienced team of innovators and make a significant contribution to a fast-growing technology company?

About the Company

Our client is a fast-paced startup with the goal of making the world a more inclusive place for the Deaf community. As a technology-for-good organisation, they are using AI to create sign language translations across video, transportation and website platforms.

The company is expanding rapidly across these sectors, providing an exciting opportunity to grow its technical team. By joining as an ML Systems Engineer, you will be at the forefront of expansion into new markets, helping shape infrastructure strategy and ensuring systems remain scalable, secure and at the cutting edge of technology.

The Role

We are looking for an ML Systems Engineer to help design and optimise the systems that power real-time AI video generation.

The company’s models generate sign language video using generative AI pipelines deployed on GPU infrastructure across both cloud and on-prem environments. A key challenge is reducing generation latency and maximising GPU utilisation so the system can deliver real-time video streams.

You will work across the full ML inference stack, from model optimisation to deployment infrastructure, ensuring models run efficiently in production environments.

This role is ideal for engineers who enjoy performance optimisation, distributed systems, and building production ML infrastructure.

Example Responsibilities

ML Inference Optimisation

  • Profile and optimise deep learning models used for sign language video generation
  • Reduce inference latency using techniques such as quantisation, pruning, mixed precision, and kernel optimisation
  • Improve GPU utilisation and throughput across inference pipelines
  • Work closely with ML researchers to ensure models are production-ready

ML Infrastructure & Deployment

  • Build and maintain scalable model serving systems
  • Deploy and operate inference services on GPU clusters
  • Design autoscaling infrastructure to meet real-time SLAs
  • Contribute to model deployment pipelines, versioning, and rollback strategies

Performance Engineering

  • Develop benchmarking frameworks for tracking inference performance
  • Identify bottlenecks across the ML pipeline and eliminate latency hotspots
  • Implement performance monitoring and alerting for production systems
  • Evaluate new hardware accelerators and inference runtimes

Current Technology Stack

The infrastructure currently includes:

  • Python, Go and Rust production services
  • PyTorch-based generative models
  • GPU inference workloads
  • Kubernetes clusters (cloud and on-prem)
  • AWS infrastructure including SageMaker
  • Real-time streaming systems using protocols such as HLS, LL-HLS, RTMP and SRT

Essential Requirements

  • 3+ years of experience in ML systems engineering, ML infrastructure, or backend systems
  • Strong programming skills in Python (Rust is a plus)
  • Experience working with production ML models
  • Experience optimising ML inference performance
  • Familiarity with containerised systems such as Docker and Kubernetes
  • Strong debugging, profiling, and performance analysis skills
  • Interest in building latency-critical systems

Desirable Requirements

  • Experience with inference optimisation tools such as TensorRT, ONNX, or similar frameworks
  • Experience with model serving systems such as Triton, TorchServe, or Ray Serve
  • Familiarity with GPU architecture and performance optimisation
  • Experience working with video, graphics, or real-time streaming systems
  • Experience deploying ML workloads at scale
  • Experience contributing to open-source ML infrastructure projects

Why Join This Company

  • Work on technology that directly improves accessibility for Deaf communities
  • Help build one of the first real-time AI sign language generation systems
  • Join a small, experienced engineering team solving challenging technical problems
  • Opportunity to take ownership of critical systems as an early engineering hire
  • Work across a modern ML infrastructure stack

Benefits

  • 24 days’ holiday plus bank holidays and company pension scheme
  • Competitive compensation and high-value equity packages
  • Opportunity to work on cutting-edge technologies and be involved in the early stages of a high-growth business
  • Free sign language classes

Hours

This is a full-time position with normal virtual office hours of 9am to 6pm, although flexibility is offered to suit reasonable personal circumstances. What matters most is strong collaboration, meeting agreed milestones and delivering high-quality work.

Please note you must have the right to work and live full-time in the UK when applying for this position.

Equality and Diversity

The company is committed to eliminating discrimination and encouraging diversity within its team. The aim is to build a workforce that is truly representative of all sections of society, where every employee feels respected and able to give their best.

A culture of encouragement and support has been created to enable employees to focus on what they want to achieve for successful career development. Work-life policies and flexible working practices help employees feel more in control of their personal and professional lives.

Any qualified applicants who are native sign language users are guaranteed an interview.

Job Details

Company
Papillon Talent Strategy Ltd
Location
England, United Kingdom
Posted