LLM Inference & Deployment Engineer

LLM Inference & Deployment Engineer (Air-Gapped Environments)

3 month + contract

Inside IR35

Hybrid working (2-3 days in London)

You've deployed 70B parameter models on GPU clusters with no Internet access. You know the difference between a model that works in a notebook and one that runs reliably in production under compliance scrutiny. If that's your world, we want to talk.

This is a genuinely specialist role. The platform you'll be working on runs multiple large-scale LLMs concurrently, frontier models for text screening, code LLMs for analysis, transformer encoders for classification, all in an air-gapped environment with a fixed compute budget and zero external API access.

You'll own the inference infrastructure end-to-end: GPU allocation strategy, quantisation decisions, batching, determinism controls, and offline deployment packaging. The system has to be fast, reliable, and auditable. That's a rare combination of skills and this role is for someone who has genuinely done it before.

What we're looking for

Production experience with vLLM, TensorRT-LLM, TGI, or equivalent at multi-GPU scale
Model quantisation expertise: GPTQ, AWQ, GGUF, bitsandbytes
Multi-node inference: tensor/pipeline/expert parallelism
Air-gapped or classified environment deployment experience strongly preferred
Offline dependency packaging: conda-pack, pip wheels, container images

If you are available and interested in this new role please send a current CV.

Apply Now

LLM Inference & Deployment Engineer

Job Details