3 of 3 vLLM Jobs in Edinburgh

Systems Research Engineer - LLM Optimisation (vLLM / TensorRT-LLM)

Hiring Organisation: Project People
Location: City Of Edinburgh, Scotland, United Kingdom

Systems Research Engineer - LLM Optimisation (vLLM/TensorRT-LLM) Permanent Edinburgh City Centre (On-site 5 days), walking distance from local transport links Salary : Competitive and negotiable, generous benefits package In an era where Large Language Models (LLMs) are rebuilding the foundational software stack, our client is at the forefront … tuning of large-scale inference and data pipelines, focusing on KV cache management, heterogeneous memory scheduling, and high-throughput inference serving using frameworks like vLLM, Ray Serve, and modern PyTorch Distributed systems. Scalable Model Serving Infrastructure : Develop and evaluate frameworks that enable efficient multi-tenant, low-latency, and fault-tolerant ...

Systems Research Engineer - Distributed Systems / C++

Hiring Organisation: European Tech Recruit
Location: Edinburgh, Scotland, United Kingdom

Conduct in-depth profiling and performance tuning of inference pipelines, focusing on KV cache management. Develop low-latency, fault-tolerant AI serving frameworks using vLLM, Ray Serve, and PyTorch Distributed. Research and prototype novel techniques for cache sharing, data locality, and resource orchestration. Translate innovative designs into publishable contributions … distributed systems, or related field. Strong knowledge of Distributed Systems, OS internals, and Machine Learning systems architecture. Hands-on experience with LLM serving frameworks (vLLM, Ray Serve, TensorRT-LLM, or TGI). Proficiency in C/C++ for systems development and Python for research prototyping. Solid grounding in distributed algorithms ...

Systems Research Engineer

Hiring Organisation: European Tech Recruit
Location: Edinburgh, Scotland, United Kingdom

depth profiling of large-scale inference pipelines, specifically focusing on KV cache management and heterogeneous memory scheduling. AI Serving: Optimising high-throughput frameworks (vLLM, Ray Serve, PyTorch Distributed) to ensure low-latency, multi-tenant performance. Research Leadership: Contributing to top-tier venues (OSDI, NSDI, EuroSys, MLSys) and driving those innovations … Stack: Strong proficiency in C/C++ for systems work, with Python for rapid prototyping. Expertise: Hands-on experience with LLM serving frameworks ( vLLM, Ray Serve, TensorRT-LLM ) and distributed algorithms. Mindset: A solid grounding in systems research methodology and performance profiling tools. The "Value Add" (Desired): A PhD focused ...