Senior ML Engineer

Company Description

Voice-Swap is building the future of AI voice technology for the creative industries - with ethics, artist partnership, and cutting-edge engineering at the core. We work directly with musicians, voice-over artists, and media partners to develop ethically licensed, production-grade AI voice models with uncompromising speaker likeness and perceptual quality.

We are now looking for a Senior Machine Learning Engineer (Speech AI) to help us push high-fidelity speech synthesis and voice conversion systems to production scale. As an early-stage, fast-moving company, we value people who take ownership, move quickly, and are comfortable operating with both autonomy and responsibility.

Learn more at https://www.voice-swap.ai.

Role Description

This is a full-time remote role for a Senior Machine Learning Engineer at Voice-Swap.

You will:

  • Implement neural speech synthesis models, prioritising speaker likeness and naturalness
  • Write model inference API scripts for product deployment
  • Write scripts for data preprocessing and model evaluation
  • Work directly with clients on text-to-speech and/or voice conversion model projects
  • Script and support professional voiceover data collection sessions
  • Reimplement and adapt architectures from scientific papers into production-ready systems
  • Contribute to improving training efficiency and deployment performance

This role requires someone comfortable moving between research papers, GPU training runs, and production APIs.

Qualifications

  • Solid understanding of the fundamental concepts of Machine Learning and Deep Learning (Transformers, CNNs, RNNs)
  • Strong grounding in mathematics, audio signal processing, speech processing, or NLP
  • Experience with ML frameworks (PyTorch or TensorFlow)
  • Experience training and deploying models on cloud services (AWS, GCP, etc.)
  • Experience reimplementing architectures from scientific papers
  • Comfortable with Git & GitHub workflows
  • Strong software engineering discipline and attention to reproducibility

Bonus Skills

  • Experience in speech synthesis (text-to-speech and/or voice conversion)
  • Training and inference optimisation (e.g., quantisation techniques)
  • MS or PhD in Computer Science or Machine Learning, or 3+ years of relevant experience
  • Publications in top-tier speech / NLP / signal processing conferences (Interspeech, ICASSP, ASRU, SLT, EUSIPCO, ACL, etc.)
  • Music production or audio engineering experience

Who Thrives Here

  • You enjoy working in a startup environment where priorities can evolve quickly
  • You are proactive and don’t wait to be told what to do
  • You are comfortable owning problems from research to production
  • You care about audio quality and technical excellence
  • You’re collaborative, reliable, and enjoyable to work with

Note: With your CV please provide a brief info of your proudest project (GitHub repo, arxiv paper link, short description).

Job Details

Company
Voice-Swap
Location
United Kingdom
Hybrid / Remote Options
Posted