5 of 5 Permanent vLLM Jobs in the UK

Senior AI Engineer

Hiring Organisation: Aveni
Location: United Kingdom

Experience working with cloud environments (preferably AWS) Nice to have Experience with containerisation technologies such as Docker or Kubernetes Experience with frameworks such as vLLM or NeMo Knowledge of financial services NLP applications Experience designing evaluation methodologies for LLM outputs Experience building intelligent agents or multi-agent systems Skills ...

Staff / Principal Machine Learning Engineer, Serving

Hiring Organisation: Inworld AI
Location: United Kingdom

need all of this. But you need enough to make a case. Inference Optimization. Deep understanding of modern serving frameworks and techniques like vLLM or TRT-LLM. Model Acceleration . Hands-on experience with quantization, distillation, caching strategies , continuous batching, paged attention, and speculative decoding. High-Performance Systems. Proficiency ...

AI Systems Research Engineer

Hiring Organisation: microTECH Global LTD
Location: Edinburgh, Scotland, United Kingdom

Strong knowledge of distributed systems, operating systems, machine learning systems architecture, Inference serving, and AI Infrastructure. · Hands-on experience with LLM serving frameworks (e.g., vLLM, Ray Serve, TensorRT-LLM, TGI) and distributed KV cache optimization. · Proficiency in C/C++, with additional experience in Python for research prototyping. · Solid grounding ...

AI Systems Research Engineer - LLM Optimisation

Hiring Organisation: Project People
Location: City Of Edinburgh, Scotland, United Kingdom

tuning of large-scale inference and data pipelines, focusing on KV cache management, heterogeneous memory scheduling, and high-throughput inference serving using frameworks like vLLM, Ray Serve, and modern PyTorch Distributed systems. Scalable Model Serving Infrastructure : Develop and evaluate frameworks that enable efficient multi-tenant, low-latency, and fault-tolerant … systems, distributed computing, or large-scale AI infrastructure are also welcome At least 2 years of experience with LLM inference/serving framework optimization (vLLM/Ray Serve/TensorRT-LLM/PyTorch) Hands-on experience with distributed KV cache optimization Familiarity with GPU and how they execute LLMs Strong ...

Member of Technical Staff

Hiring Organisation: Geometric
Location: City of London, London, United Kingdom

implementation level. Attention variants, KV cache strategies, quantisation schemes, and how they shape kernel design You've worked with production inference or training frameworks, vLLM, Megatron-LM, etc. You've built performance-critical infrastructure before - compilers, profilers, auto-tuners, or search systems You have real intuition for evolutionary methods, fitness … work of François Chollet, Kenneth Stanley, Jeff Clune, Jurgen Schmidhuber, David Ha, and Christian Szegedy Bonus Open-source kernel contributions (FlashAttention, FlashInfer, vLLM, Unsloth, Liger-Kernels, ThunderKittens) Publications in ML/AI, kernel optimisation or evolutionary methods (NeurIPS, ICLR, CVPR, GECCO or equivalent) Other HW experience (AMD, MLX, edge ...