8 of 8 Permanent vLLM Jobs in the UK excluding London

Senior PyTorch Engineer

Hiring Organisation
Advanced Micro Devices
Location
East Anglia, United Kingdom
Employment Type
Permanent
test design to ensure high-quality, maintainable software solutions. AI Framework & Deep Learning: Strong understanding of AI frameworks such as PyTorch, Triton and vLLM, with applied knowledge across domains such as Natural Language Processing, Vision, Audio and Recommendation Systems. GPU Computing: Strong experience with GPU Programming models (CUDA, HIP). ...

AI Systems Research Engineer

Hiring Organisation
microTECH Global LTD
Location
Edinburgh, Scotland, United Kingdom
Strong knowledge of distributed systems, operating systems, machine learning systems architecture, Inference serving, and AI Infrastructure. · Hands-on experience with LLM serving frameworks (e.g., vLLM, Ray Serve, TensorRT-LLM, TGI) and distributed KV cache optimization. · Proficiency in C/C++, with additional experience in Python for research prototyping. · Solid grounding ...

AI Infrastructure Architect

Hiring Organisation
Microtech Global Ltd
Location
Dunfermline, Fife, UK
Employment Type
Full-time
operating systems, and runtime environments; Hands-on experience with Serverless architectures and cloud-native optimization technologies such as containers, Kubernetes, service orchestration, and autoscaling vLLM, SGLang, Ray Serve, etc.); understand common optimization concepts such as continuous batching, KV-Cache reuse, parallelism, and compression/quantization/distillation Proficient in using ...

AI Infrastructure Architect

Hiring Organisation
Microtech Global Ltd
Location
Edinburgh, Midlothian, Scotland, United Kingdom
Employment Type
Permanent
operating systems, and runtime environments; Hands-on experience with Serverless architectures and cloud-native optimization technologies such as containers, Kubernetes, service orchestration, and autoscaling vLLM, SGLang, Ray Serve, etc.); understand common optimization concepts such as continuous batching, KV-Cache reuse, parallelism, and compression/quantization/distillation Proficient in using ...

AI Infrastructure Architect

Hiring Organisation
Microtech Global Ltd
Location
Broughton, Scottish Borders, UK
Employment Type
Full-time
operating systems, and runtime environments; Hands-on experience with Serverless architectures and cloud-native optimization technologies such as containers, Kubernetes, service orchestration, and autoscaling vLLM, SGLang, Ray Serve, etc.); understand common optimization concepts such as continuous batching, KV-Cache reuse, parallelism, and compression/quantization/distillation Proficient in using ...

AI Infrastructure Architect

Hiring Organisation
Microtech Global Ltd
Location
Livingston, West Lothian, UK
Employment Type
Full-time
operating systems, and runtime environments; Hands-on experience with Serverless architectures and cloud-native optimization technologies such as containers, Kubernetes, service orchestration, and autoscaling vLLM, SGLang, Ray Serve, etc.); understand common optimization concepts such as continuous batching, KV-Cache reuse, parallelism, and compression/quantization/distillation Proficient in using ...

Infrastructure Architect

Hiring Organisation
microTECH Global LTD
Location
Edinburgh, Scotland, United Kingdom
operating systems, and runtime environments; Hands-on experience with Serverless architectures and cloud-native optimization technologies such as containers, Kubernetes, service orchestration, and autoscaling vLLM, SGLang, Ray Serve, etc.); understand common optimization concepts such as continuous batching, KV-Cache reuse, parallelism, and compression/quantization/distillation Proficient in using ...

esearch (Systems) Engineer

Hiring Organisation
Microtech Global Ltd
Location
Edinburgh, Midlothian, Scotland, United Kingdom
Employment Type
Permanent
field. Strong knowledge of distributed systems, operating systems,machine learning systems architecture, Inference serving, and AI Infrastructure. Hands-on experience withLLM serving frameworks(e.g.,vLLM,Ray Serve,TensorRT-LLM,TGI) anddistributed KV cache optimization. Proficiency inC/C++, with additional experience inPythonfor research prototyping. Solid grounding insystems research methodology,distributed ...