R&D Machine Learning Engineer (Speech and Voice) (0–3 Years Experience).
London / Hybrid | 0–3 Years Experience
Are you fascinated by human speech, voice, and how machines understand, generate, and interact through sound?
We’re an early-stage AI startup building next-generation speech and voice technologies — from intelligent voice agents and conversational systems to adaptive audio-driven AI products. Our work sits at the intersection of machine learning research and real-world deployment, and we’re looking for curious, ambitious engineers to help push the boundaries.
What You’ll Be Doing- Researching, training, and fine-tuning speech and audio models, including ASR, TTS, speaker recognition, and voice interaction systems
- Building and optimising speech-to-text, text-to-speech, and conversational AI pipelines, integrating LLMs where appropriate
- Designing and maintaining audio data pipelines for collection, preprocessing, augmentation, and evaluation
- Experimenting with multimodal models that combine speech, text, and contextual signals
- Exploring prompt engineering, RAG, and memory architectures for voice-driven AI systems
- Collaborating with engineers to deploy models into low-latency, production environments
- Developing internal tools for model monitoring, evaluation, and continuous improvement
- Staying close to current research in speech, audio, and conversational AI, with time and support to explore and publish
- 0–3 years of experience in Machine Learning, AI, Speech Processing, or Applied Research
- Strong Python skills and hands-on ML experience
- Experience with PyTorch, TensorFlow, or Hugging Face
- Solid understanding of deep learning fundamentals, particularly for sequence and audio models
- Familiarity with, or strong interest in:
- Automatic Speech Recognition (ASR)
- Text-to-Speech (TTS)
- Speaker diarisation and speaker identification
- Audio feature extraction (MFCCs, spectrograms, embeddings)
- Transformers, sequence models, and multimodal architectures
- Curiosity, strong problem-solving skills, and comfort working in a fast-moving startup environment
- Hands-on mentorship from senior ML engineers, AI researchers, and founders
- Freedom to experiment with state-of-the-art speech and voice models
- A modern ML stack including Python, PyTorch, Hugging Face, OpenAI APIs, vector databases, and cloud infrastructure
- Flexible working with a hybrid model and regular in-person collaboration
- Accelerated career growth through ownership of real R&D and production systems
- A culture that values learning, technical depth, and impact over bureaucracy
- Graduates or junior engineers with a strong interest in speech, voice, or audio ML
- Researchers wanting to see their work deployed in real products
- Engineers excited by applied R&D, real-time systems, and human–AI interaction
Machine Learning Engineer, Speech Recognition, Voice AI, Audio ML, Automatic Speech Recognition, ASR, Text-to-Speech, TTS, Speaker Recognition, Conversational AI, Multimodal AI, Deep Learning, Transformers, NLP, LLMs, Generative AI, Python, PyTorch, TensorFlow, Hugging Face, RAG, MLOps, Data Pipelines, Model Deployment, AI Research, Applied AI, AI Systems, AI R&D, Speech Technology, Voice Technology, AI Startup, Early-Stage Startup.