Senior ML Engineer
Company Description
Voice-Swap is building the future of AI voice technology for the creative industries - with ethics, artist partnership, and cutting-edge engineering at the core. We work directly with musicians, voice-over artists, and media partners to develop ethically licensed, production-grade AI voice models with uncompromising speaker likeness and perceptual quality.
We are now looking for a Senior Machine Learning Engineer (Speech AI) to help us push high-fidelity speech synthesis and voice conversion systems to production scale. As an early-stage, fast-moving company, we value people who take ownership, move quickly, and are comfortable operating with both autonomy and responsibility.
Learn more at https://www.voice-swap.ai.
Role Description
This is a full-time remote role for a Senior Machine Learning Engineer at Voice-Swap.
You will:
- Implement neural speech synthesis models, prioritising speaker likeness and naturalness
- Write model inference API scripts for product deployment
- Write scripts for data preprocessing and model evaluation
- Work directly with clients on text-to-speech and/or voice conversion model projects
- Script and support professional voiceover data collection sessions
- Reimplement and adapt architectures from scientific papers into production-ready systems
- Contribute to improving training efficiency and deployment performance
This role requires someone comfortable moving between research papers, GPU training runs, and production APIs.
Qualifications
- Solid understanding of the fundamental concepts of Machine Learning and Deep Learning (Transformers, CNNs, RNNs)
- Strong grounding in mathematics, audio signal processing, speech processing, or NLP
- Experience with ML frameworks (PyTorch or TensorFlow)
- Experience training and deploying models on cloud services (AWS, GCP, etc.)
- Experience reimplementing architectures from scientific papers
- Comfortable with Git & GitHub workflows
- Strong software engineering discipline and attention to reproducibility
Bonus Skills
- Experience in speech synthesis (text-to-speech and/or voice conversion)
- Training and inference optimisation (e.g., quantisation techniques)
- MS or PhD in Computer Science or Machine Learning, or 3+ years of relevant experience
- Publications in top-tier speech / NLP / signal processing conferences (Interspeech, ICASSP, ASRU, SLT, EUSIPCO, ACL, etc.)
- Music production or audio engineering experience
Who Thrives Here
- You enjoy working in a startup environment where priorities can evolve quickly
- You are proactive and don’t wait to be told what to do
- You are comfortable owning problems from research to production
- You care about audio quality and technical excellence
- You’re collaborative, reliable, and enjoyable to work with
Note: With your CV please provide a brief info of your proudest project (GitHub repo, arxiv paper link, short description).