26 to 33 of 33 vLLM Jobs

Senior Software Engineer I, Inference

Hiring Organisation
CoreWeave
Location
Sunnyvale, California, United States
Employment Type
Permanent
Salary
USD Annual
metrics-driven work. Bachelor's or Master's in CS, EE, or related field (or equivalent practical experience). Preferred: Contributions to inference frameworks (vLLM, Triton, TensorRT-LLM, Ray Serve, TorchServe). Experience with CUDA kernels, NCCL/SHARP, RDMA/NUMA, or GPU interconnect topologies. Leading multi-team initiatives ...

Senior Software Engineer II, Inference

Hiring Organisation
CoreWeave
Location
Sunnyvale, California, United States
Employment Type
Permanent
Salary
USD Annual
streaming token delivery. Proven track record improving tail latency (P95/P99) and service reliability through metrics-driven work. Preferred: Contributions to inference frameworks (vLLM, Triton, TensorRT-LLM, Ray Serve, TorchServe). Experience with CUDA kernels, NCCL/SHARP, RDMA/NUMA, or GPU interconnect topologies. Leading multi-team initiatives ...

Simulation Model Development and Verification Engineer- Aerospace - Shanghai, China

Hiring Organisation
Strongfield
Location
Shanghai, China
Employment Type
Contract
Simulation Model Development and Verification Engineer- Aerospace - Shanghai, China. Our Aerospace client offers long term steady work prospects to suitable candidates with experience gained within Simulation Modelling - Development and Verification Engineering. Strongfield have been supporting ...

Sr. Software Engineer - Perf and Benchmarking

Hiring Organisation
CoreWeave
Location
Sunnyvale, California, United States
Employment Type
Permanent
Salary
USD Annual
Prometheus, Grafana, OpenTelemetry). Experience with performance-critical GPU systems (CUDA, NCCL, RDMA, NVLink/PCIe, memory bandwidth) and model-serving stacks (llm-d, vLLM, TensorRT-LLM, Megatron-LM). Strong communicator comfortable collaborating with cross-functional teams and external partners. Nice to have Experience with time-series databases … storage engines, or custom data pipelines. Experience running MLPerf submissions or similar large-scale audited benchmarks. Contributions to OSS projects such as llm-d, vLLM or PyTorch. Exposure to benchmarking large GPU fleets or multi-region clusters. Experience with CUDA kernels, NCCL/SHARP, RDMA/NUMA, or GPU interconnect ...

AI Systems Research Engineer - LLM Optimisation

Hiring Organisation
Project People
Location
United Kingdom, UK
tuning of large-scale inference and data pipelines, focusing on KV cache management, heterogeneous memory scheduling, and high-throughput inference serving using frameworks like vLLM, Ray Serve, and modern PyTorch Distributed systems. Scalable Model Serving Infrastructure : Develop and evaluate frameworks that enable efficient multi-tenant, low-latency, and fault-tolerant … systems, distributed computing, or large-scale AI infrastructure are also welcome At least 2 years of experience with LLM inference/serving framework optimization (vLLM/Ray Serve/TensorRT-LLM/PyTorch) Hands-on experience with distributed KV cache optimization Familiarity with GPU and how they execute LLMs Strong ...

AI Systems Research Engineer - LLM Optimisation

Hiring Organisation
Project People
Location
City Of Edinburgh, Scotland, United Kingdom
tuning of large-scale inference and data pipelines, focusing on KV cache management, heterogeneous memory scheduling, and high-throughput inference serving using frameworks like vLLM, Ray Serve, and modern PyTorch Distributed systems. Scalable Model Serving Infrastructure : Develop and evaluate frameworks that enable efficient multi-tenant, low-latency, and fault-tolerant … systems, distributed computing, or large-scale AI infrastructure are also welcome At least 2 years of experience with LLM inference/serving framework optimization (vLLM/Ray Serve/TensorRT-LLM/PyTorch) Hands-on experience with distributed KV cache optimization Familiarity with GPU and how they execute LLMs Strong ...

Member of Technical Staff

Hiring Organisation
Geometric
Location
Ireland, UK
implementation level. Attention variants, KV cache strategies, quantisation schemes, and how they shape kernel design You've worked with production inference or training frameworks, vLLM, Megatron-LM, etc. You've built performance-critical infrastructure before - compilers, profilers, auto-tuners, or search systems You have real intuition for evolutionary methods, fitness … work of François Chollet, Kenneth Stanley, Jeff Clune, Jurgen Schmidhuber, David Ha, and Christian Szegedy Bonus Open-source kernel contributions (FlashAttention, FlashInfer, vLLM, Unsloth, Liger-Kernels, ThunderKittens) Publications in ML/AI, kernel optimisation or evolutionary methods (NeurIPS, ICLR, CVPR, GECCO or equivalent) Other HW experience (AMD, MLX, edge ...

Member of Technical Staff

Hiring Organisation
Geometric
Location
City of London, London, United Kingdom
implementation level. Attention variants, KV cache strategies, quantisation schemes, and how they shape kernel design You've worked with production inference or training frameworks, vLLM, Megatron-LM, etc. You've built performance-critical infrastructure before - compilers, profilers, auto-tuners, or search systems You have real intuition for evolutionary methods, fitness … work of François Chollet, Kenneth Stanley, Jeff Clune, Jurgen Schmidhuber, David Ha, and Christian Szegedy Bonus Open-source kernel contributions (FlashAttention, FlashInfer, vLLM, Unsloth, Liger-Kernels, ThunderKittens) Publications in ML/AI, kernel optimisation or evolutionary methods (NeurIPS, ICLR, CVPR, GECCO or equivalent) Other HW experience (AMD, MLX, edge ...