2 of 2 Permanent Low Latency Jobs in Edinburgh

AI Systems Research Engineer - LLM Optimisation

Hiring Organisation
Project People
Location
City Of Edinburgh, Scotland, United Kingdom
using frameworks like vLLM, Ray Serve, and modern PyTorch Distributed systems. Scalable Model Serving Infrastructure : Develop and evaluate frameworks that enable efficient multi-tenant, low-latency, and fault-tolerant AI serving across distributed environments. Research and prototype new techniques for cache sharing, data locality, and resource orchestration ...

Systems Research Engineer - AI Infrastructure / Distributed Systems

Hiring Organisation
European Tech Recruit
Location
Edinburgh, Scotland, United Kingdom
inference pipelines Improve key-value cache efficiency and memory scheduling Identify bottlenecks and enhance system scalability using systematic performance analysis AI Serving Infrastructure Develop low-latency, multi-tenant, fault-tolerant model serving systems Work on areas such as cache sharing, data locality, and cluster scheduling Prototype and evaluate ...