Machine Learning Engineer

ML Systems Engineer

Are you interested in helping us craft exceptional experiences for our clients that deliver genuine social impact? Are you ready to join a small and experienced team of innovators and make a significant contribution to a fast-growing technology company?

About the Company

Our client is a fast-paced startup with the goal of making the world a more inclusive place for the Deaf community. As a technology-for-good organisation, they are using AI to create sign language translations across video, transportation and website platforms.

The company is expanding rapidly across these sectors, providing an exciting opportunity to grow its technical team. By joining as an ML Systems Engineer, you will be at the forefront of expansion into new markets, helping shape infrastructure strategy and ensuring systems remain scalable, secure and at the cutting edge of technology.

The Role

We are looking for an ML Systems Engineer to help design and optimise the systems that power real-time AI video generation.

The company’s models generate sign language video using generative AI pipelines deployed on GPU infrastructure across both cloud and on-prem environments. A key challenge is reducing generation latency and maximising GPU utilisation so the system can deliver real-time video streams.

You will work across the full ML inference stack, from model optimisation to deployment infrastructure, ensuring models run efficiently in production environments.

This role is ideal for engineers who enjoy performance optimisation, distributed systems, and building production ML infrastructure.

Example Responsibilities

ML Inference Optimisation

Profile and optimise deep learning models used for sign language video generation
Reduce inference latency using techniques such as quantisation, pruning, mixed precision, and kernel optimisation
Improve GPU utilisation and throughput across inference pipelines
Work closely with ML researchers to ensure models are production-ready

ML Infrastructure & Deployment

Build and maintain scalable model serving systems
Deploy and operate inference services on GPU clusters
Design autoscaling infrastructure to meet real-time SLAs
Contribute to model deployment pipelines, versioning, and rollback strategies

Performance Engineering

Develop benchmarking frameworks for tracking inference performance
Identify bottlenecks across the ML pipeline and eliminate latency hotspots
Implement performance monitoring and alerting for production systems
Evaluate new hardware accelerators and inference runtimes

Current Technology Stack

The infrastructure currently includes:

Python, Go and Rust production services
PyTorch-based generative models
GPU inference workloads
Kubernetes clusters (cloud and on-prem)
AWS infrastructure including SageMaker
Real-time streaming systems using protocols such as HLS, LL-HLS, RTMP and SRT

Essential Requirements

3+ years of experience in ML systems engineering, ML infrastructure, or backend systems
Strong programming skills in Python (Rust is a plus)
Experience working with production ML models
Experience optimising ML inference performance
Familiarity with containerised systems such as Docker and Kubernetes
Strong debugging, profiling, and performance analysis skills
Interest in building latency-critical systems

Desirable Requirements

Experience with inference optimisation tools such as TensorRT, ONNX, or similar frameworks
Experience with model serving systems such as Triton, TorchServe, or Ray Serve
Familiarity with GPU architecture and performance optimisation
Experience working with video, graphics, or real-time streaming systems
Experience deploying ML workloads at scale
Experience contributing to open-source ML infrastructure projects

Why Join This Company

Work on technology that directly improves accessibility for Deaf communities
Help build one of the first real-time AI sign language generation systems
Join a small, experienced engineering team solving challenging technical problems
Opportunity to take ownership of critical systems as an early engineering hire
Work across a modern ML infrastructure stack

Benefits

24 days’ holiday plus bank holidays and company pension scheme
Competitive compensation and high-value equity packages
Opportunity to work on cutting-edge technologies and be involved in the early stages of a high-growth business
Free sign language classes

Hours

This is a full-time position with normal virtual office hours of 9am to 6pm, although flexibility is offered to suit reasonable personal circumstances. What matters most is strong collaboration, meeting agreed milestones and delivering high-quality work.

Please note you must have the right to work and live full-time in the UK when applying for this position.

Equality and Diversity

The company is committed to eliminating discrimination and encouraging diversity within its team. The aim is to build a workforce that is truly representative of all sections of society, where every employee feels respected and able to give their best.

A culture of encouragement and support has been created to enable employees to focus on what they want to achieve for successful career development. Work-life policies and flexible working practices help employees feel more in control of their personal and professional lives.

Any qualified applicants who are native sign language users are guaranteed an interview.

Apply Now

Machine Learning Engineer

Job Details