Machine Learning Engineer

Build Low Latency Conversational AI Systems

We are building real-time conversational AI systems built on top of large language models, speech AI, and agentic workflows. Our platform combines ASR, LLMs, and TTS into production-grade AI systems used globally across enterprise environments where latency, reliability, and scalability matter.

We are hiring a Machine Learning Engineer to build low-latency production systems for our LLM team. This role is centred around writing scalable code that enables real-time conversational AI to perform reliably under heavy production workloads.

You’ll work closely with our LLM and speech teams to solve challenges around inference speed, concurrency, request handling, GPU performance, distributed systems, and real-time response streaming.

What you’ll do

Build and optimise low-latency LLM systems for real-time conversational AI
Write production-grade Python code focused on performance, scalability, and reliability
Design systems capable of handling large volumes of concurrent real-time requests
Solve engineering challenges around batching, request scheduling, queue management, streaming responses, and distributed workloads
Improve inference speed, GPU memory usage, and overall system responsiveness
Deploy and optimise open-source LLMs using tooling such as vLLM, TensorRT-LLM, Triton, SGLang, CUDA, or similar technologies
Build scalable orchestration layers and ML pipelines around LLM systems, including RAG and agentic workflows
Develop backend inference services and APIs for production AI systems
Productionise new model capabilities and features for real-world customer use cases

What we’re looking for

Strong experience writing production-grade software for machine learning systems
Strong Python engineering skills
Experience building low-latency or highly concurrent systems
Strong problem-solving ability and enjoyment of building systems from the ground up
Experience with distributed systems, parallel workloads, and performance optimisation
Experience working with inference tooling such as vLLM, TensorRT, Triton, CUDA, ONNX, or similar technologies
Experience building scalable backend services or ML systems used in production
Understanding of real-time systems and performance-focused engineering
Strong communication skills and ability to work closely with engineers and researchers

Why this role?

You’ll work on designing and building low-latency conversational AI systems capable of serving large volumes of concurrent real-time requests. The role focuses on solving difficult engineering challenges around inference speed, reliability, concurrency, GPU performance, and scalable production AI systems.

Apply Now

Machine Learning Engineer

Job Details