23 of 23 vLLM Jobs in the UK

Enterprise Architect - AI

Hiring Organisation: Jobleads-UK
Location: Greater London, England, United Kingdom

TensorFlow at a working, hands-on level. Distributed training: Horovod, DeepSpeed, Megatron-LM, or equivalent multi-node training frameworks. Inference & serving: NVIDIA Triton, vLLM, TensorRT-LLM, or equivalent high-throughput serving platforms. MLOps/LLMOps: Kubeflow, MLflow, and at least one hyperscaler ML platform (SageMaker, Azure ML, or Vertex ...

Forward Deployed ML Engineer, Agents

Hiring Organisation: Jobleads-UK
Location: Greater London, England, United Kingdom

plus Model fine-tuning practical experience with LoRA/QLoRA, supervised fine-tuning, or RLHF workflows is a plus Inference optimization experience with vLLM, TensorRT-LLM, Triton, or model quantization techniques is desirable Observability tooling practical experience with LLM monitoring, tracing, and evaluation frameworks is a strong plus Familiarity with ...

Lead Platform DevOps Engineer

Hiring Organisation: Guidant Global
Location: City of London, London, United Kingdom
Employment Type: Contract
Contract Rate: £600 - £800/day

tools and patterns such as: - Building MLOps platforms using frameworks such as Kubeflow (or comparable approaches) - Operating model serving and inference platforms (e.g. KServe, vLLM, or comparable solutions) - Supporting LLM-based workloads, including optimisation and serving considerations - Providing notebook-based development environments (e.g. JupyterHub) within secure platforms - Exposure to emerging ...

Senior AI Engineer

Hiring Organisation: Jobleads-UK
Location: Greater London, England, United Kingdom

development practices (testing, code review, CI/CD). Knowledge of vector databases (Pinecone, Weaviate, Qdrant) and embedding models. Experience with model serving frameworks (vLLM, TensorRT, Ray). Experience with A/B testing and experimentation frameworks for AI features. Experience with model observability tools (LangSmith, W&B, MLflow). ...

Principal Machine Learning Engineer

Hiring Organisation: Jobleads-UK
Location: Greater London, England, United Kingdom

Hands-on experience with cloud-native ML infrastructure platforms Knowledge of vector databases (Pinecone, Weaviate, Qdrant) and embedding models Experience with model serving frameworks (vLLM, TensorRT, Ray) Experience with A/B testing and experimentation frameworks for AI features Contributions to open-source ML projects or research publications Experience with ...

Manager, Research Engineering (Foundational Research)

Hiring Organisation: Jobleads-UK
Location: Greater London, England, United Kingdom

preference. Technical Expertise: Deep proficiency in Python and modern software development practices. Hands‐on experience with Distributed Training infrastructure (Multi‐node GPU training, Kubernetes, vLLM). Familiarity with Deep Learning frameworks (PyTorch). Experience with MLOps tools and experiment tracking (e.g., ClearML, MLFlow, Weights & Biases). Research Fluency: Ability ...

Software Engineer, GenAI Platform

Hiring Organisation: Jobleads-UK
Location: Greater London, England, United Kingdom

software development lifecycle, including designing, generating code, testing, monitoring and releasing software Nice To Haves Experience with LLM inference engines and serving frameworks (e.g., vLLM, SGLang, TensorRT‐LLM) in production Experience with distributed/multi‐node fine‐tuning and training pipelines (SFT, DPO/RLHF, LoRA), including data preparation ...

AI Engineer (Fluent in Mandarin & English)

Hiring Organisation: Jobleads-UK
Location: Greater London, England, United Kingdom

experience with LLM training cycles, parameter-efficient fine-tuning (PEFT), and sophisticated prompt engineering. Inference Stack: Experience with high-performance inference servers (e.g., vLLM, TGI, or Triton ) and an understanding of how to optimize models for GPU deployment. Infrastructure: Comfortable working in Linux-based environments and proficient in managing containerized ...

Senior Software Engineer

Hiring Organisation: Jobleads-UK
Location: Cambridge, England, United Kingdom

related field. Desirable: Exposure to machine learning frameworks such as PyTorch, JAX, Triton, TensorFlow Experience with distributed workload management systems such as Kubernetes, VLLM, Keras or MLOps pipelines Experience working with hardware simulators or emulators (e.g. QEMU). Experience developing for or working with FPGA-based systems. Experience with people ...

Senior Research Scientist | Model Steering

Hiring Organisation: Jobleads-UK
Location: Greater London, England, United Kingdom

without degrading their reasoning capabilities. Experience with machine translation, multilingual NLP, or language quality estimation. Familiarity with inference and serving at scale (e.g. via vLLM, SGLang, TensorRT‐LLM, etc) and long‐context modelling. Publications at top‐tier venues. What we offer Diverse and internationally distributed team : joining our team means ...

Solution Architect - GPU & HPC

Hiring Organisation: Jobleads-UK
Location: Greater London, England, United Kingdom

like across different GPU generations and topologies. Exposure to MLOps tooling and AI platform layers: experiment tracking (MLflow, W'B), model serving frameworks (Triton, vLLM), and pipeline orchestration (Kubeflow, Airflow). Familiarity with InfiniBand and high-performance networking as it relates to distributed training performance — sufficient to engage credibly with ...

ML Infrastructure Engineer

Hiring Organisation: Jobleads-UK
Location: Greater London, England, United Kingdom

across the stack, for example py‐spy, PyTorch profiler, Nsight, perf, tracing, metrics, logs, or custom instrumentation Have experience with inference stacks such as vLLM, SGLang, TensorRT‐LLM, Dynamo, or custom serving infrastructure Can reason from system metrics back to model behavior: when latency, queueing, sampling, data order, rollout throughput ...

Applied AI Engineer

Hiring Organisation: McGregor Boyall Associates Limited
Location: London, United Kingdom
Employment Type: Permanent, Work From Home

production ML systems Comfortable working across models, infrastructure and product Enjoy working in fast-moving, early-stage environments Tech Stack Python * PyTorch * JAX * LLMs * vLLM * Vector Databases * Modern Agent Frameworks Get in touch for more details - McGregor Boyall is an equal opportunity employer and do not discriminate on any grounds. ...

Senior Software Product Strategy & Product Marketing Lead — Data Center AI and Personal AI - Qu[...]

Hiring Organisation: Jobleads-UK
Location: Greater London, England, United Kingdom

platform strategy. Strong understanding of AI software platforms, including inference, model deployment, runtimes, SDKs, developer tools, AI orchestration frameworks (e.g., Kubernetes‐based systems, Ray, vLLM, TGI), and cloud‐native and edge AI software stacks. Experience with data center AI infrastructure, including CPUs, GPUs, NPUs, heterogeneous accelerator environments, Kubernetes‐based environments ...

Software Inference Deployment Engineer

Hiring Organisation: Jobleads-UK
Location: Oxford, England, United Kingdom

PyTorch in particular) Practical experience with model deployment workflows - loading, format conversion, quantisation, or framework integration Comfortable working with inference serving stacks (for example vLLM, TensorRT‐LLM, or similar) Familiarity with Linux, containerisation (Docker), and cluster environments Comfortable in a customer‐facing role, able to communicate clearly with ...

Senior Software Product Strategy & Product Marketing Lead — Data Center AI and Personal AI - Qu[...]

Hiring Organisation: Jobleads-UK
Location: Greater London, England, United Kingdom

platform strategy. Strong understanding of AI software platforms, including: Inference, model deployment, runtimes, SDKs, developer tools AI orchestration frameworks (e.g., Kubernetes-based systems, Ray, vLLM, TGI) Cloud-native and edge AI software stacks Experience with data center AI infrastructure, including: CPUs, GPUs, NPUs, and heterogeneous accelerator environments Kubernetes-based environments ...

AI Platform engineer

Hiring Organisation: Nextech Group Limited
Location: East London, London, United Kingdom
Employment Type: Permanent, Work From Home
Salary: £85,000

processing of high-volume inference requests Implement observability and cost-tracking for token usage across multiple LLM providers (Anthropic, OpenAI, open-source models via vLLM) Own database performance for both relational (Postgres) and vector (pgvector/Pinecone) workloads Collaborate with ML engineers on model-serving infrastructure and prompt-caching strategies … Node.js) Databases: PostgreSQL, Redis, Pinecone/pgvector Infra: AWS (ECS, Lambda, SQS/SNS), Docker, Kubernetes, Terraform AI/ML tooling: LangChain/LlamaIndex, vLLM, Anthropic & OpenAI APIs, embedding models Observability: Datadog, Grafana, OpenTelemetry CI/CD: GitHub Actions, ArgoCD Requirements: 4+ years backend development experience, ideally with at least ...

Lead Software Engineer - LLM Ops Platform Reliability

Hiring Organisation: Jobleads-UK
Location: Glasgow, Scotland, United Kingdom

reliability, performance, and cost‐efficiency of the LLM inference platform end to end. You will operate large language model serving stacks (such as vLLM and llm‐d) in production at scale, with deep instrumentation and strong operational rigor. You will partner across engineering to deliver secure software, improve stability … infrastructure Build backend services and APIs that enable reliable operation of AI infrastructure in production Operate and scale LLM serving infrastructure (such as vLLM and llm‐d), including model hosting, request routing, continuous batching, and KV‐cache optimization Deploy, host, and lifecycle‐manage open‐source and proprietary LLMs on Amazon ...

Project Technical Lead - AI Systems Simulation

Hiring Organisation: Jobleads-UK
Location: Cambridge, England, United Kingdom

infrastructure, ML systems, or computer architecture. Familiarity with Agile or other modern technical project management frameworks. Knowledge of modern inference‐serving frameworks (e.g., vLLM). Background in statistics, operations research, or large‐scale datacenter infrastructure. Contributions to open‐source AI or systems projects. Benefits High‐impact role in a rapidly ...

AI Inference Engineer

Hiring Organisation: Jobleads-UK
Location: Greater London, England, United Kingdom

throughput and cost per token, partnering with the CUDA/GPU engineers. Make the core software architecture calls on serving frameworks and orchestration (e.g. vLLM, TensorRT-LLM, SGLang, Triton Inference Server, or equivalents). Translate throughput, latency, and uptime commitments into concrete technical specifications and serving capacity plans. ...

Senior Software Engineer, Inference Platform

Hiring Organisation: Jobleads-UK
Location: Greater London, England, United Kingdom

requests per second while maintaining sub‐second response times and cost efficiency. Experience with Golang is strongly preferred, and exposure to inference engines (vLLM, TGI, TensorRT), containerization, and distributed systems is an added bonus. You take ownership of platform‐level decisions, think strategically about performance vs. cost trade‐offs … experience building and scaling backend systems, distributed platforms, or inference infrastructure Strong understanding of AI/ML inference systems and experience with inference engines (vLLM, TGI, TensorRT‐LLM, or similar) Deep knowledge of distributed systems design, microservices architecture, and API gateway patterns Proficiency in Golang strongly preferred; Python, Rust, C++ ...

Architect, Staff & Senior Systems Software Engineer

Hiring Organisation: Jobleads-UK
Location: Greater London, England, United Kingdom

direction you shape across the platform. Responsibilities Own the Runtime & Serving Stack: Design, build, and extend the distributed inference and serving stack (e.g., vLLM, SGLang, NVIDIA Dynamo, TensorRT-LLM) onto DX-1, rather than treating any layer as a black box. Scale Distributed Inference: Define how inference scales across many … runtime/network/accelerator boundary. Demonstrated ownership of a hard, end-to-end systems problem, ideally extending a distributed inference/serving stack (vLLM, SGLang, NVIDIA Dynamo, TensorRT-LLM) in production, with specifics on what you built or changed and why. Distributed inference at scale: parallelism strategies, collective communication ...

Architect/Staff Systems Software Engineer London, UK

Hiring Organisation: Jobleads-UK
Location: Greater London, England, United Kingdom

direction you shape across the platform. Responsibilities Own the Runtime & Serving Stack: Design, build, and extend the distributed inference and serving stack (e.g. vLLM, SGLang, NVIDIA Dynamo, TensorRT-LLM) onto DX-1, rather than treating any layer as a black box. Scale Distributed Inference: Define how inference scales across many … runtime/network/accelerator boundary. Demonstrated ownership of a hard, end-to-end systems problem, ideally extending a distributed inference/serving stack (vLLM, SGLang, NVIDIA Dynamo, TensorRT-LLM) in production, with specifics on what you built or changed and why. Distributed inference at scale: parallelism strategies, collective communication ...