3 of 3 vLLM Jobs in Edinburgh

Systems Research Engineer - LLM Optimisation (vLLM / TensorRT-LLM)

Hiring Organisation
Project People
Location
City Of Edinburgh, Scotland, United Kingdom
Systems Research Engineer - LLM Optimisation (vLLM/TensorRT-LLM) Permanent Edinburgh City Centre (On-site 5 days), walking distance from local transport links Salary : Competitive and negotiable, generous benefits package In an era where Large Language Models (LLMs) are rebuilding the foundational software stack, our client is at the forefront … tuning of large-scale inference and data pipelines, focusing on KV cache management, heterogeneous memory scheduling, and high-throughput inference serving using frameworks like vLLM, Ray Serve, and modern PyTorch Distributed systems. Scalable Model Serving Infrastructure : Develop and evaluate frameworks that enable efficient multi-tenant, low-latency, and fault-tolerant ...

Systems Research Engineer - Distributed Systems / C++

Hiring Organisation
European Tech Recruit
Location
Edinburgh, Scotland, United Kingdom
Conduct in-depth profiling and performance tuning of inference pipelines, focusing on KV cache management. Develop low-latency, fault-tolerant AI serving frameworks using vLLM, Ray Serve, and PyTorch Distributed. Research and prototype novel techniques for cache sharing, data locality, and resource orchestration. Translate innovative designs into publishable contributions … distributed systems, or related field. Strong knowledge of Distributed Systems, OS internals, and Machine Learning systems architecture. Hands-on experience with LLM serving frameworks (vLLM, Ray Serve, TensorRT-LLM, or TGI). Proficiency in C/C++ for systems development and Python for research prototyping. Solid grounding in distributed algorithms ...

Systems Research Engineer

Hiring Organisation
European Tech Recruit
Location
Edinburgh, Scotland, United Kingdom
depth profiling of large-scale inference pipelines, specifically focusing on KV cache management and heterogeneous memory scheduling. AI Serving: Optimising high-throughput frameworks (vLLM, Ray Serve, PyTorch Distributed) to ensure low-latency, multi-tenant performance. Research Leadership: Contributing to top-tier venues (OSDI, NSDI, EuroSys, MLSys) and driving those innovations … Stack: Strong proficiency in C/C++ for systems work, with Python for rapid prototyping. Expertise: Hands-on experience with LLM serving frameworks ( vLLM, Ray Serve, TensorRT-LLM ) and distributed algorithms. Mindset: A solid grounding in systems research methodology and performance profiling tools. The "Value Add" (Desired): A PhD focused ...