13 of 13 vLLM Jobs in the UK

Associate Director

Hiring Organisation
Experis UK
Location
England, United Kingdom
systems integrated into complex enterprise workflows • Deploying and optimising models on AWS Bedrock, Azure AI Foundry, and self-hosted GPU infrastructure (vLLM, SGLang, Ollama) • Building robust MLOps/LLMOps pipelines — CI/CD, model monitoring, evaluation frameworks, observability • Making the call on when AI is the right answer — and when ...

Associate Director & AI Delivery Lead

Hiring Organisation
Experis UK
Location
England, United Kingdom
optimise models using cloud services like AWS Bedrock and Azure AI Foundry, or self-host them on GPU/CPU hardware using tools like vLLM, SGLang, and Ollama. Implement frameworks and approaches to evaluate model performance against business objectives, both pre-deployment and on an ongoing basis as part ...

Senior Software Engineer

Hiring Organisation
Jobleads-UK
Location
Cambridge, England, United Kingdom
related field. Desirable: Exposure to machine learning frameworks such as PyTorch, JAX, Triton, TensorFlow Experience with distributed workload management systems such as Kubernetes, VLLM, Keras or MLOps pipelines Experience working with hardware simulators or emulators (e.g. QEMU). Experience developing for or working with FPGA-based systems. Experience with people ...

Senior Platform Engineer

Hiring Organisation
Lorien
Location
London, South East, England, United Kingdom
Employment Type
Contractor
Contract Rate
Salary negotiable
behave in production. Experience or exposure to areas such as: MLOps platforms (e.g. Kubeflow or similar frameworks) Model serving and inference platforms (e.g. KServe, vLLM , or equivalent) Supporting LLM-based workloads , including performance and scaling considerations Notebook environments such as JupyterHub Awareness of emerging tooling around Responsible/Trustworthy ...

AI Malware Researcher

Hiring Organisation
RevEng.AI
Location
Greater London, England, United Kingdom
cloud malware analysis infrastructure Analyst copilots for reverse engineering workflows Tech Stack Examples of technologies we use include: Python, Rust, C++ AssemblyLine PyTorch, Transformers, vLLM IDA Pro SDK, Ghidra APIs, Binary Ninja APIs LLVM, angr, Capstone, Triton Docker, Kubernetes Vector databases and knowledge graphs GPU inference infrastructure Malware sandboxing frameworks ...

Lead Platform Engineer

Hiring Organisation
Lorien
Location
London, South East, England, United Kingdom
Employment Type
Contractor
Contract Rate
Salary negotiable
areas such as: Building or operating MLOps platforms using tools like Kubeflow or similar frameworks Running model serving and inference platforms (e.g. KServe, vLLM, or equivalent) Supporting LLM-based workloads , including optimisation and serving considerations Providing notebook-based environments such as JupyterHub in secure platforms Exposure to emerging tooling such ...

Staff / Principal Machine Learning Engineer, Serving

Hiring Organisation
Inworld AI
Location
United Kingdom
need all of this. But you need enough to make a case. Inference Optimization. Deep understanding of modern serving frameworks and techniques like vLLM or TRT-LLM. Model Acceleration . Hands-on experience with quantization, distillation, caching strategies , continuous batching, paged attention, and speculative decoding. High-Performance Systems. Proficiency ...

Principal Machine Learning Engineer

Hiring Organisation
Jobleads-UK
Location
United Kingdom
systems end‐to‐end. A bias toward shipping, learning fast, and improving systems through iteration. Ideal Experience Experience with LLM inference frameworks such as vLLM, TensorRT‐LLM, or FasterTransformer. Contributions to open‐source ML or systems libraries. Background in scientific computing, compilers, or GPU kernels. Experience with RLHF pipelines ...

AI Systems Research Engineer

Hiring Organisation
microTECH Global LTD
Location
Edinburgh, Scotland, United Kingdom
Strong knowledge of distributed systems, operating systems, machine learning systems architecture, Inference serving, and AI Infrastructure. · Hands-on experience with LLM serving frameworks (e.g., vLLM, Ray Serve, TensorRT-LLM, TGI) and distributed KV cache optimization. · Proficiency in C/C++, with additional experience in Python for research prototyping. · Solid grounding ...

Software Engineer

Hiring Organisation
Acceler8 Talent
Location
City of London, London, United Kingdom
model training 🔧 What They’re Looking For Deep GPU infrastructure/distributed systems experience Strong knowledge of CUDA, NCCL, PyTorch, DeepSpeed, JAX, Megatron-LM, vLLM, etc. Experience operating large-scale GPU clusters (1,000+ GPUs) Kubernetes, Slurm, or similar orchestration expertise BONUS: Experience working on NVIDIA Blackwell chips (B200, B300 ...

Systems Research Engineer - LLM Optimisation (vLLM / TensorRT-LLM)

Hiring Organisation
Project People
Location
City Of Edinburgh, Scotland, United Kingdom
Systems Research Engineer - LLM Optimisation (vLLM/TensorRT-LLM) Permanent Edinburgh City Centre (On-site 5 days), walking distance from local transport links Salary : Competitive and negotiable, generous benefits package In an era where Large Language Models (LLMs) are rebuilding the foundational software stack, our client is at the forefront … tuning of large-scale inference and data pipelines, focusing on KV cache management, heterogeneous memory scheduling, and high-throughput inference serving using frameworks like vLLM, Ray Serve, and modern PyTorch Distributed systems. Scalable Model Serving Infrastructure : Develop and evaluate frameworks that enable efficient multi-tenant, low-latency, and fault-tolerant ...

Systems Research Engineer

Hiring Organisation
European Tech Recruit
Location
Edinburgh, Scotland, United Kingdom
depth profiling of large-scale inference pipelines, specifically focusing on KV cache management and heterogeneous memory scheduling. AI Serving: Optimising high-throughput frameworks (vLLM, Ray Serve, PyTorch Distributed) to ensure low-latency, multi-tenant performance. Research Leadership: Contributing to top-tier venues (OSDI, NSDI, EuroSys, MLSys) and driving those innovations … Stack: Strong proficiency in C/C++ for systems work, with Python for rapid prototyping. Expertise: Hands-on experience with LLM serving frameworks ( vLLM, Ray Serve, TensorRT-LLM ) and distributed algorithms. Mindset: A solid grounding in systems research methodology and performance profiling tools. The "Value Add" (Desired): A PhD focused ...

Member of Technical Staff

Hiring Organisation
Geometric
Location
London Area, United Kingdom
implementation level. Attention variants, KV cache strategies, quantisation schemes, and how they shape kernel design You've worked with production inference or training frameworks, vLLM, Megatron-LM, etc. You've built performance-critical infrastructure before - compilers, profilers, auto-tuners, or search systems You have real intuition for evolutionary methods, fitness … work of François Chollet, Kenneth Stanley, Jeff Clune, Jurgen Schmidhuber, David Ha, and Christian Szegedy Bonus Open-source kernel contributions (FlashAttention, FlashInfer, vLLM, Unsloth, Liger-Kernels, ThunderKittens) Publications in ML/AI, kernel optimisation or evolutionary methods (NeurIPS, ICLR, CVPR, GECCO or equivalent) Other HW experience (AMD, MLX, edge ...