26 to 33 of 33 vLLM Jobs

Senior Software Engineer I, Inference

Hiring Organisation: CoreWeave
Location: Sunnyvale, California, United States
Employment Type: Permanent
Salary: USD Annual

metrics-driven work. Bachelor's or Master's in CS, EE, or related field (or equivalent practical experience). Preferred: Contributions to inference frameworks (vLLM, Triton, TensorRT-LLM, Ray Serve, TorchServe). Experience with CUDA kernels, NCCL/SHARP, RDMA/NUMA, or GPU interconnect topologies. Leading multi-team initiatives ...

Senior Software Engineer II, Inference

Hiring Organisation: CoreWeave
Location: Sunnyvale, California, United States
Employment Type: Permanent
Salary: USD Annual

streaming token delivery. Proven track record improving tail latency (P95/P99) and service reliability through metrics-driven work. Preferred: Contributions to inference frameworks (vLLM, Triton, TensorRT-LLM, Ray Serve, TorchServe). Experience with CUDA kernels, NCCL/SHARP, RDMA/NUMA, or GPU interconnect topologies. Leading multi-team initiatives ...

Simulation Model Development and Verification Engineer- Aerospace - Shanghai, China

Hiring Organisation: Strongfield
Location: Shanghai, China
Employment Type: Contract

Simulation Model Development and Verification Engineer- Aerospace - Shanghai, China. Our Aerospace client offers long term steady work prospects to suitable candidates with experience gained within Simulation Modelling - Development and Verification Engineering. Strongfield have been supporting ...

Sr. Software Engineer - Perf and Benchmarking

Hiring Organisation: CoreWeave
Location: Sunnyvale, California, United States
Employment Type: Permanent
Salary: USD Annual

Prometheus, Grafana, OpenTelemetry). Experience with performance-critical GPU systems (CUDA, NCCL, RDMA, NVLink/PCIe, memory bandwidth) and model-serving stacks (llm-d, vLLM, TensorRT-LLM, Megatron-LM). Strong communicator comfortable collaborating with cross-functional teams and external partners. Nice to have Experience with time-series databases … storage engines, or custom data pipelines. Experience running MLPerf submissions or similar large-scale audited benchmarks. Contributions to OSS projects such as llm-d, vLLM or PyTorch. Exposure to benchmarking large GPU fleets or multi-region clusters. Experience with CUDA kernels, NCCL/SHARP, RDMA/NUMA, or GPU interconnect ...

AI Systems Research Engineer - LLM Optimisation

Hiring Organisation: Project People
Location: United Kingdom, UK

tuning of large-scale inference and data pipelines, focusing on KV cache management, heterogeneous memory scheduling, and high-throughput inference serving using frameworks like vLLM, Ray Serve, and modern PyTorch Distributed systems. Scalable Model Serving Infrastructure : Develop and evaluate frameworks that enable efficient multi-tenant, low-latency, and fault-tolerant … systems, distributed computing, or large-scale AI infrastructure are also welcome At least 2 years of experience with LLM inference/serving framework optimization (vLLM/Ray Serve/TensorRT-LLM/PyTorch) Hands-on experience with distributed KV cache optimization Familiarity with GPU and how they execute LLMs Strong ...

AI Systems Research Engineer - LLM Optimisation

Hiring Organisation: Project People
Location: City Of Edinburgh, Scotland, United Kingdom

Member of Technical Staff

Hiring Organisation: Geometric
Location: Ireland, UK

implementation level. Attention variants, KV cache strategies, quantisation schemes, and how they shape kernel design You've worked with production inference or training frameworks, vLLM, Megatron-LM, etc. You've built performance-critical infrastructure before - compilers, profilers, auto-tuners, or search systems You have real intuition for evolutionary methods, fitness … work of François Chollet, Kenneth Stanley, Jeff Clune, Jurgen Schmidhuber, David Ha, and Christian Szegedy Bonus Open-source kernel contributions (FlashAttention, FlashInfer, vLLM, Unsloth, Liger-Kernels, ThunderKittens) Publications in ML/AI, kernel optimisation or evolutionary methods (NeurIPS, ICLR, CVPR, GECCO or equivalent) Other HW experience (AMD, MLX, edge ...

Member of Technical Staff

Hiring Organisation: Geometric
Location: City of London, London, United Kingdom