Research Engineer (Inference)

Research Engineer (Inference)

About

Serving a multimodal agent model in production is a different problem to serving a standard LLM. Context length, tool calls, and computer-use workloads create constraints that require co-designing the inference stack with the model team - not just bolting on a serving framework after the fact.

This is a VC-backed challenger lab building state-of-the-art computer-use agents. The inference team owns the full stack from engine layer (vLLM, SGLang) through to serving architecture (disaggregated inference, intelligent routing).

The team operates at the intersection of research and production - translating cutting-edge techniques directly into the systems behind live agent products.

What you'll do

Build and operate the inference stack serving multimodal agentic models in production
Improve latency, throughput, and cost across the serving stack
Research and implement inference techniques tailored to agent workloads
Co-design with the models team on training-time decisions that affect inference behaviour
Evaluate inference frameworks and hardware platforms and feed findings back into roadmap decisions
Stay current with advances in inference, model serving, and accelerator technology

What you'll need

Strong software engineering fundamentals and a solid production track record
Proficient in Python and at least one systems language - Rust, C++, or Go
Hands-on experience with PyTorch or JAX in an industry setting
Experience with inference frameworks: vLLM, SGLang, TensorRT-LLM
Solid distributed systems fundamentals and experience operating production ML infrastructure
Working knowledge of modern ML including transformers and multimodal architectures

Optional Bonus

Research engagement: advanced degree with research output, top-tier publications (NeurIPS, ICML, MLSys, OSDI), or open-source contributions
GPU kernel work - CUDA, Triton, or similar
Experience with quantisation, speculative decoding, disaggregated inference, or KV-cache compression

Shortlisted candidates will be contacted within 48 hours.

Apply Now

Research Engineer (Inference)

Job Details