R&D Machine Learning Engineer (Speech and Voice) (0–3 Years Experience).

Machine Learning Engineer (Speech, Voice & AI Systems)

London / Hybrid | 0–3 Years Experience

Are you fascinated by human speech, voice, and how machines understand, generate, and interact through sound?

We’re an early-stage AI startup building next-generation speech and voice technologies — from intelligent voice agents and conversational systems to adaptive audio-driven AI products. Our work sits at the intersection of machine learning research and real-world deployment, and we’re looking for curious, ambitious engineers to help push the boundaries.

What You’ll Be Doing

Researching, training, and fine-tuning speech and audio models, including ASR, TTS, speaker recognition, and voice interaction systems
Building and optimising speech-to-text, text-to-speech, and conversational AI pipelines, integrating LLMs where appropriate
Designing and maintaining audio data pipelines for collection, preprocessing, augmentation, and evaluation
Experimenting with multimodal models that combine speech, text, and contextual signals
Exploring prompt engineering, RAG, and memory architectures for voice-driven AI systems
Collaborating with engineers to deploy models into low-latency, production environments
Developing internal tools for model monitoring, evaluation, and continuous improvement
Staying close to current research in speech, audio, and conversational AI, with time and support to explore and publish

What We’re Looking For

0–3 years of experience in Machine Learning, AI, Speech Processing, or Applied Research
Strong Python skills and hands-on ML experience
Experience with PyTorch, TensorFlow, or Hugging Face
Solid understanding of deep learning fundamentals, particularly for sequence and audio models
Familiarity with, or strong interest in:
Automatic Speech Recognition (ASR)
Text-to-Speech (TTS)
Speaker diarisation and speaker identification
Audio feature extraction (MFCCs, spectrograms, embeddings)
Transformers, sequence models, and multimodal architectures
Curiosity, strong problem-solving skills, and comfort working in a fast-moving startup environment

What You’ll Get

Hands-on mentorship from senior ML engineers, AI researchers, and founders
Freedom to experiment with state-of-the-art speech and voice models
A modern ML stack including Python, PyTorch, Hugging Face, OpenAI APIs, vector databases, and cloud infrastructure
Flexible working with a hybrid model and regular in-person collaboration
Accelerated career growth through ownership of real R&D and production systems
A culture that values learning, technical depth, and impact over bureaucracy

Perfect For

Graduates or junior engineers with a strong interest in speech, voice, or audio ML
Researchers wanting to see their work deployed in real products
Engineers excited by applied R&D, real-time systems, and human–AI interaction

Machine Learning Engineer, Speech Recognition, Voice AI, Audio ML, Automatic Speech Recognition, ASR, Text-to-Speech, TTS, Speaker Recognition, Conversational AI, Multimodal AI, Deep Learning, Transformers, NLP, LLMs, Generative AI, Python, PyTorch, TensorFlow, Hugging Face, RAG, MLOps, Data Pipelines, Model Deployment, AI Research, Applied AI, AI Systems, AI R&D, Speech Technology, Voice Technology, AI Startup, Early-Stage Startup.

Apply Now

R&D Machine Learning Engineer (Speech and Voice) (0–3 Years Experience).

Job Details