25 of 25 vLLM Jobs

Enterprise Architect - AI

Hiring Organisation: Jobleads-UK
Location: Greater London, England, United Kingdom

TensorFlow at a working, hands-on level. Distributed training: Horovod, DeepSpeed, Megatron-LM, or equivalent multi-node training frameworks. Inference & serving: NVIDIA Triton, vLLM, TensorRT-LLM, or equivalent high-throughput serving platforms. MLOps/LLMOps: Kubeflow, MLflow, and at least one hyperscaler ML platform (SageMaker, Azure ML, or Vertex ...

Senior Solution Architect - UK

Hiring Organisation: Jobleads-UK
Location: Greater London, England, United Kingdom

/CD, monitoring, and orchestration frameworks (e.g., Kubeflow, Flyte, MLflow); proficiency with Docker and Kubernetes for AI workload containerization. Understanding of LLM inference stacks (vLLM, llama.cpp, OpenVINO) and model delivery formats (ONNX, .safetensors, Hugging Face model hub). Experience sizing GPU infrastructure for LLM inference or training workloads (memory, throughput ...

Forward Deployed ML Engineer, Agents

Hiring Organisation: Jobleads-UK
Location: Greater London, England, United Kingdom

plus Model fine-tuning practical experience with LoRA/QLoRA, supervised fine-tuning, or RLHF workflows is a plus Inference optimization experience with vLLM, TensorRT-LLM, Triton, or model quantization techniques is desirable Observability tooling practical experience with LLM monitoring, tracing, and evaluation frameworks is a strong plus Familiarity with ...

Senior AI Engineer

Hiring Organisation: Jobleads-UK
Location: Greater London, England, United Kingdom

development practices (testing, code review, CI/CD). Knowledge of vector databases (Pinecone, Weaviate, Qdrant) and embedding models. Experience with model serving frameworks (vLLM, TensorRT, Ray). Experience with A/B testing and experimentation frameworks for AI features. Experience with model observability tools (LangSmith, W&B, MLflow). ...

Principal Machine Learning Engineer

Hiring Organisation: Jobleads-UK
Location: Greater London, England, United Kingdom

Hands-on experience with cloud-native ML infrastructure platforms Knowledge of vector databases (Pinecone, Weaviate, Qdrant) and embedding models Experience with model serving frameworks (vLLM, TensorRT, Ray) Experience with A/B testing and experimentation frameworks for AI features Contributions to open-source ML projects or research publications Experience with ...

Senior Software Engineer

Hiring Organisation: Jobleads-UK
Location: Cambridge, England, United Kingdom

related field. Desirable: Exposure to machine learning frameworks such as PyTorch, JAX, Triton, TensorFlow Experience with distributed workload management systems such as Kubernetes, VLLM, Keras or MLOps pipelines Experience working with hardware simulators or emulators (e.g. QEMU). Experience developing for or working with FPGA-based systems. Experience with people ...

AI Engineer (Fluent in Mandarin & English)

Hiring Organisation: Jobleads-UK
Location: Greater London, England, United Kingdom

experience with LLM training cycles, parameter-efficient fine-tuning (PEFT), and sophisticated prompt engineering. Inference Stack: Experience with high-performance inference servers (e.g., vLLM, TGI, or Triton ) and an understanding of how to optimize models for GPU deployment. Infrastructure: Comfortable working in Linux-based environments and proficient in managing containerized ...

Software Engineer, Machine Learning Infrastructure

Hiring Organisation: Jobleads-UK
Location: Greater London, England, United Kingdom

software development lifecycle, including designing, generating code, testing, monitoring and releasing software Nice To Haves Experience with LLM inference engines and serving frameworks (e.g., vLLM, SGLang, TensorRT-LLM) in production Experience with distributed/multi‐node fine‐tuning and training pipelines (SFT, DPO/RLHF, LoRA), including data preparation ...

AI Infrastructure Engineer, Serving Platform London, UK Apply →

Hiring Organisation: Jobleads-UK
Location: Greater London, England, United Kingdom

Proven ability to solve complex problems and work independently in fast-moving environments. Nice to haves: Experience with modern LLM serving frameworks such as vLLM, SGLang, TensorRT-LLM, or text-generation-inference. PLEASE NOTE: Our policy requires a 90-day waiting period before reconsidering candidates for the same role. This ...

Senior Engineer, Infrastructure

Hiring Organisation: Jobleads-UK
Location: United Kingdom

fundamentals, problem-solving skills, and the ability to quickly learn unfamiliar technologies. Experience deploying or operating large language models with serving frameworks such as vLLM or SGLang is considered an advantage. Benefits Fully remote position with the flexibility to work from Europe. Opportunity to shape infrastructure foundations for next-generation ...

Solution Architect - GPU & HPC

Hiring Organisation: Jobleads-UK
Location: Greater London, England, United Kingdom

like across different GPU generations and topologies. Exposure to MLOps tooling and AI platform layers: experiment tracking (MLflow, W'B), model serving frameworks (Triton, vLLM), and pipeline orchestration (Kubeflow, Airflow). Familiarity with InfiniBand and high-performance networking as it relates to distributed training performance — sufficient to engage credibly with ...

ML Infrastructure Engineer

Hiring Organisation: Jobleads-UK
Location: Greater London, England, United Kingdom

across the stack, for example py‐spy, PyTorch profiler, Nsight, perf, tracing, metrics, logs, or custom instrumentation Have experience with inference stacks such as vLLM, SGLang, TensorRT‐LLM, Dynamo, or custom serving infrastructure Can reason from system metrics back to model behavior: when latency, queueing, sampling, data order, rollout throughput ...

Senior Research Scientist | Model Steering

Hiring Organisation: Jobleads-UK
Location: Greater London, England, United Kingdom

without degrading their reasoning capabilities. Experience with machine translation, multilingual NLP, or language quality estimation. Familiarity with inference and serving at scale (e.g. via vLLM, SGLang, TensorRT‐LLM, etc) and long‐context modelling. Publications at top‐tier venues. What we offer Diverse and internationally distributed team : joining our team means ...

Applied AI Engineer

Hiring Organisation: McGregor Boyall Associates Limited
Location: London, United Kingdom
Employment Type: Permanent, Work From Home

production ML systems Comfortable working across models, infrastructure and product Enjoy working in fast-moving, early-stage environments Tech Stack Python * PyTorch * JAX * LLMs * vLLM * Vector Databases * Modern Agent Frameworks Get in touch for more details - McGregor Boyall is an equal opportunity employer and do not discriminate on any grounds. ...

Software Inference Deployment Engineer

Hiring Organisation: Jobleads-UK
Location: Oxford, England, United Kingdom

PyTorch in particular) Practical experience with model deployment workflows - loading, format conversion, quantisation, or framework integration Comfortable working with inference serving stacks (for example vLLM, TensorRT‐LLM, or similar) Familiarity with Linux, containerisation (Docker), and cluster environments Comfortable in a customer‐facing role, able to communicate clearly with ...

Senior Software Product Strategy & Product Marketing Lead — Data Center AI and Personal AI - Qu[...]

Hiring Organisation: Jobleads-UK
Location: Greater London, England, United Kingdom

platform strategy. Strong understanding of AI software platforms, including inference, model deployment, runtimes, SDKs, developer tools, AI orchestration frameworks (e.g., Kubernetes‐based systems, Ray, vLLM, TGI), and cloud‐native and edge AI software stacks. Experience with data center AI infrastructure, including CPUs, GPUs, NPUs, heterogeneous accelerator environments, Kubernetes‐based environments ...

Senior Software Product Strategy & Product Marketing Lead — Data Center AI and Personal AI - Qu[...]

Hiring Organisation: Jobleads-UK
Location: Greater London, England, United Kingdom

platform strategy. Strong understanding of AI software platforms, including: Inference, model deployment, runtimes, SDKs, developer tools AI orchestration frameworks (e.g., Kubernetes-based systems, Ray, vLLM, TGI) Cloud-native and edge AI software stacks Experience with data center AI infrastructure, including: CPUs, GPUs, NPUs, and heterogeneous accelerator environments Kubernetes-based environments ...

NLP Performance Engineer

Hiring Organisation: Jobleads-UK
Location: Greater London, England, United Kingdom

transformer inference, including prefill versus decode, KV‐cache behaviour, attention variants and performance bottlenecks Hands‐on experience with LLM serving frameworks such as vLLM, SGLang, TensorRT‐LLM or TGI, and the PyTorch ecosystem Experience with inference optimisation techniques, including quantisation, speculative decoding and model parallelism across modern GPU architectures Strong ...

Lead Site Reliability Engineer - Operations Excellence

Hiring Organisation: Jobleads-UK
Location: Glasgow, Scotland, United Kingdom

reliability, performance, and cost‐efficiency of the LLM inference platform end to end. You will operate large language model serving stacks (such as vLLM and llm‐d) in production at scale, with deep instrumentation and strong operational rigor. You will partner across engineering to deliver secure software, improve stability … infrastructure Build backend services and APIs that enable reliable operation of AI infrastructure in production Operate and scale LLM serving infrastructure (such as vLLM and llm‐d), including model hosting, request routing, continuous batching, and KV‐cache optimization Deploy, host, and lifecycle‐manage open‐source and proprietary LLMs on Amazon ...

AI Platform engineer

Hiring Organisation: Nextech Group Limited
Location: East London, London, United Kingdom
Employment Type: Permanent, Work From Home
Salary: £85,000

processing of high-volume inference requests Implement observability and cost-tracking for token usage across multiple LLM providers (Anthropic, OpenAI, open-source models via vLLM) Own database performance for both relational (Postgres) and vector (pgvector/Pinecone) workloads Collaborate with ML engineers on model-serving infrastructure and prompt-caching strategies … Node.js) Databases: PostgreSQL, Redis, Pinecone/pgvector Infra: AWS (ECS, Lambda, SQS/SNS), Docker, Kubernetes, Terraform AI/ML tooling: LangChain/LlamaIndex, vLLM, Anthropic & OpenAI APIs, embedding models Observability: Datadog, Grafana, OpenTelemetry CI/CD: GitHub Actions, ArgoCD Requirements: 4+ years backend development experience, ideally with at least ...

Project Technical Lead - AI Systems Simulation

Hiring Organisation: Jobleads-UK
Location: Cambridge, England, United Kingdom

infrastructure, ML systems, or computer architecture. Familiarity with Agile or other modern technical project management frameworks. Knowledge of modern inference‐serving frameworks (e.g., vLLM). Background in statistics, operations research, or large‐scale datacenter infrastructure. Contributions to open‐source AI or systems projects. Benefits High‐impact role in a rapidly ...

AI Inference Engineer

Hiring Organisation: Jobleads-UK
Location: Greater London, England, United Kingdom

throughput and cost per token, partnering with the CUDA/GPU engineers. Make the core software architecture calls on serving frameworks and orchestration (e.g. vLLM, TensorRT-LLM, SGLang, Triton Inference Server, or equivalents). Translate throughput, latency, and uptime commitments into concrete technical specifications and serving capacity plans. ...

Senior Data Center, AI Software Product Strategy & Go-To-Market Lead — Qualcomm - Europe

Hiring Organisation: Jobleads-UK
Location: Greater London, England, United Kingdom

optimized in a Data Center AI software platforms Inference, model deployment, runtimes, SDKs, developer tools. AI orchestration frameworks, e.g., Kubernetes-based systems, Ray, vLLM, TGI. Cloud-native AI software stacks, deployment patterns, and data center operating models. Data center AI infrastructure CPUs, GPUs, NPUs, and heterogeneous accelerator environments. Kubernetes-based ...

Senior Software Engineer, Inference Platform

Hiring Organisation: Jobleads-UK
Location: Greater London, England, United Kingdom

requests per second while maintaining sub‐second response times and cost efficiency. Experience with Golang is strongly preferred, and exposure to inference engines (vLLM, TGI, TensorRT), containerization, and distributed systems is an added bonus. You take ownership of platform‐level decisions, think strategically about performance vs. cost trade‐offs … experience building and scaling backend systems, distributed platforms, or inference infrastructure Strong understanding of AI/ML inference systems and experience with inference engines (vLLM, TGI, TensorRT‐LLM, or similar) Deep knowledge of distributed systems design, microservices architecture, and API gateway patterns Proficiency in Golang strongly preferred; Python, Rust, C++ ...

Architect/Staff Systems Software Engineer

Hiring Organisation: Jobleads-UK
Location: Greater London, England, United Kingdom

direction you shape across the platform. Responsibilities Own the Runtime & Serving Stack: Design, build, and extend the distributed inference and serving stack (e.g. vLLM, SGLang, NVIDIA Dynamo, TensorRT-LLM) onto DX-1, rather than treating any layer as a black box. Scale Distributed Inference: Define how inference scales across many … runtime/network/accelerator boundary. Demonstrated ownership of a hard, end-to-end systems problem, ideally extending a distributed inference/serving stack (vLLM, SGLang, NVIDIA Dynamo, TensorRT-LLM) in production, with specifics on what you built or changed and why. Distributed inference at scale: parallelism strategies, collective communication ...