Machine Learning Systems Engineer
About the role
We are building real-time conversational AI systems for contact centres, powered by ASR, LLMs, and TTS.
As an LLM Systems Engineer, you will sit within our LLM team and focus on the systems layer that makes production Conversational AI work at scale. You’ll design and improve the infrastructure, orchestration, and runtime systems behind low-latency conversational AI workflows.
This role focuses on solving the technical challenges associated with delivering real-time AI conversations: coordinating complex AI systems under strict latency and reliability constraints.
What you’ll do
- Design and build systems that enable LLM workflows to maintain real-time responses even under peak load
- Improve latency, throughput, concurrency, and reliability across our production systems
- Build orchestration logic for model calls, services, queues, retries, fallbacks, and routing that balances load management with low response times
- Help scale systems to support high volumes of concurrent real-time conversations
- Optimise memory usage and resource efficiency across LLM-powered services
- Deploy and support autoscaling in AI services running in AWS-based systems
- Build observability into AI workflows, including monitoring, logging, alerting, and performance tracking
- Work closely with data scientists, MLEs, prototype engineers, and backend engineers
- Help turn LLM capabilities into stable, scalable production Conversational AI systems
What we’re looking for
- Strong Python engineering skills
- Experience building production backend systems, distributed systems, or ML infrastructure
- Strong understanding of scalability, latency, reliability, and performance engineering
- Experience with cloud infrastructure, ideally AWS
- Experience working with APIs, queues, service orchestration, and production monitoring
- Understanding of how LLMs are used in production systems
- Ability to reason about concurrency, throughput, memory usage, and failure handling
- Strong debugging skills across complex production systems
Nice to have
- Experience with conversational AI, voice systems, ASR, TTS, or real-time streaming systems
- Experience with model serving or inference infrastructure
- Exposure to open-source LLMs or LLM orchestration frameworks
- Experience with Docker, Kubernetes, ECS, or similar container orchestration tools
- Experience with Redis, Kafka, Kinesis, SQS, or similar queueing/event systems
- Familiarity with monitoring tools such as CloudWatch, Prometheus, or Grafana
Why join?
You’ll help build the systems behind real-time AI conversations used in production contact centre environments. This is a high-impact engineering role focused on low latency, scalability, reliability, and making LLM-powered systems work under real-world load.