Machine Learning Engineer - Inference
Role: Machine Learning Engineer (Speech)
Location: Manchester, UK (Hybrid)
About the job
Can you see yourself revolutionising the Agentic AI industry? We are a multi-award-winning AI and SaaS provider based in Manchester, dedicated to boosting productivity and efficiency across our global customer base spanning five continents. As a Machine Learning Engineer, you will be the architect responsible for maximising the fluidity, scale, and performance of our ASR and TTS products within the Agentic AI pipeline.
This role offers a world-class research and production environment: you will work alongside like-minded scientists who develop state-of-the-art models, acting as the vital link between experimental research and enabling these bespoke systems to be used by our customers.
Your primary focus will be implementing highly efficient inference pipelines for real-time and offline speech recognition and synthesis, ensuring our Agentic vision translates into a seamless user experience.
Responsibilities
- Liaise with ASR and TTS technical leads, software engineers, and DevOps teams to deploy models efficiently on Hopper and Blackwell GPU architectures.
- Maintain steady biweekly progression within our sprint-based research environment.
- Implement low-latency, real-time inference ML pipelines using tools such as Triton Inference Server, vLLM, or SGLang.
- Optimise model performance across cloud platforms using frameworks like TensorRT and ONNX.
- Build and maintain robust API services using Python-based web frameworks (e.g., FastAPI).
- Manage containerisation and orchestration workflows using Docker and Kubernetes.
- Ensure system reliability through observability and monitoring tools like Prometheus, Grafana, and OpenTelemetry.
- Write concise technical documentation and research papers.
Requirements
- MSc plus 2+ years of hands-on experience in Speech or Generative AI.
- Deep understanding of Generative AI, Neural Networks, and the latest LLM architectures.
- Expert-level proficiency in Python and PyTorch.
- Proven experience in performance optimisation and cloud platform deployment.
- Strong background in containerisation and orchestration (Docker, K8s, etc.).
- Demonstrated ability to deploy and scale low-latency ML pipelines in production.
- Strong oral and written communication skills.