13 of 13 Remote/Hybrid vLLM Jobs

Senior AI Platform Engineer

Hiring Organisation
Klaviyo
Location
Boston, Massachusetts, United States
Employment Type
Permanent
Salary
USD Annual
Build tools to tune LLM-based features, deploy agents, MCP and RAG in production, and evaluate performance using tools like Arize, OpenAI, Bedrock, LangChain, vLLM and Ray. Develop reliable, scalable data pipelines and APIs for AI systems. Foster a culture of ownership, experimentation, and customer-first thinking. ...

Senior AI Engineer

Hiring Organisation
Aveni
Location
United Kingdom
Experience working with cloud environments (preferably AWS) Nice to have Experience with containerisation technologies such as Docker or Kubernetes Experience with frameworks such as vLLM or NeMo Knowledge of financial services NLP applications Experience designing evaluation methodologies for LLM outputs Experience building intelligent agents or multi-agent systems Skills ...

Principal MLOps Engineer

Hiring Organisation
Raft Company Website
Location
Boston, Massachusetts, United States
Employment Type
Permanent
Salary
USD Annual
first 90 days of employment Highly preferred: Experience with ML model serving and inference platforms such as Triton Inference Server, KServe, Ray Serve, vLLM, or similar technologies Experience with secure and compliant deployment practices in regulated or government environments Experience with Kubernetes-based ML platforms such as Kubeflow Familiarity with ...

AI / ML Engineer (WISRD Platform - ISR & Tactical Edge AI)

Hiring Organisation
CyOne, Inc
Location
Washington, Washington DC, United States
Employment Type
Permanent
Salary
USD Annual
APIs and streaming technologies Infrastructure & Tools Experience with Docker and containerized deployments Familiarity with Kubernetes and distributed systems Experience with model serving frameworks (e.g., vLLM, TGI, Ollama) Experience with GPU-based compute environments Experience 3+ years of experience in AI/ML engineering or related field Experience deploying AI/ ...

Staff Software Engineer - Backend & AI Infra - Trading

Hiring Organisation
Career Renew
Location
Washington, Washington DC, United States
Employment Type
Permanent
Salary
USD Annual
built it, deployed it, operated it, fixed it at 3am Strong Plus Experience with model serving/LLM infrastructure - deploying, scaling, and optimizing inference (vLLM, TGI, TensorRT-LLM, or managed endpoints) Background in trading systems, exchange APIs, or fintech where uptime has direct financial consequences Experience with onchain infrastructure : wallet ...

Senior Software Engineer - Data Lake & BI

Hiring Organisation
CoreWeave
Location
Sunnyvale, California, United States
Employment Type
Permanent
Salary
USD Annual
pipelines. Experience running MLPerf submissions or similar large-scale audited benchmarks. Contributions to OSS projects such as Apache Iceberg, Apache Spark, Trino, llm-d, vLLM, or PyTorch. Exposure to benchmarking large GPU fleets or multi-region clusters. Experience with CUDA kernels, NCCL/SHARP, RDMA/NUMA, or GPU interconnect ...

Software Engineer, Inference AI/ML

Hiring Organisation
CoreWeave
Location
Sunnyvale, California, United States
Employment Type
Permanent
Salary
USD Annual
from experienced engineers. About the role: Implement well-scoped features and fixes in Python/Go/C++ for model-serving services (e.g., Triton, vLLM, TensorRT-LLM, Ray Serve). Write tests, code comments, and short design docs; participate in code reviews. Add basic metrics and dashboards; assist with alarms ...

Staff AI Engineer

Hiring Organisation
Career Renew
Location
Washington, Washington DC, United States
Employment Type
Permanent
Salary
USD Annual
time Strong Plus Experience with financial ML - signal generation, alpha research, portfolio optimization, or execution optimization LLM fine-tuning and serving - PEFT/LoRA, vLLM, TGI, or custom inference pipelines in production Multi-agent systems - designing systems where autonomous agents coordinate, compete, or learn from each other Onchain data ...

Senior Software Engineer I, Inference

Hiring Organisation
CoreWeave
Location
Sunnyvale, California, United States
Employment Type
Permanent
Salary
USD Annual
metrics-driven work. Bachelor's or Master's in CS, EE, or related field (or equivalent practical experience). Preferred: Contributions to inference frameworks (vLLM, Triton, TensorRT-LLM, Ray Serve, TorchServe). Experience with CUDA kernels, NCCL/SHARP, RDMA/NUMA, or GPU interconnect topologies. Leading multi-team initiatives ...

Senior Software Engineer II, Inference

Hiring Organisation
CoreWeave
Location
Sunnyvale, California, United States
Employment Type
Permanent
Salary
USD Annual
streaming token delivery. Proven track record improving tail latency (P95/P99) and service reliability through metrics-driven work. Preferred: Contributions to inference frameworks (vLLM, Triton, TensorRT-LLM, Ray Serve, TorchServe). Experience with CUDA kernels, NCCL/SHARP, RDMA/NUMA, or GPU interconnect topologies. Leading multi-team initiatives ...

Sr. Software Engineer - Perf and Benchmarking

Hiring Organisation
CoreWeave
Location
Sunnyvale, California, United States
Employment Type
Permanent
Salary
USD Annual
Prometheus, Grafana, OpenTelemetry). Experience with performance-critical GPU systems (CUDA, NCCL, RDMA, NVLink/PCIe, memory bandwidth) and model-serving stacks (llm-d, vLLM, TensorRT-LLM, Megatron-LM). Strong communicator comfortable collaborating with cross-functional teams and external partners. Nice to have Experience with time-series databases … storage engines, or custom data pipelines. Experience running MLPerf submissions or similar large-scale audited benchmarks. Contributions to OSS projects such as llm-d, vLLM or PyTorch. Exposure to benchmarking large GPU fleets or multi-region clusters. Experience with CUDA kernels, NCCL/SHARP, RDMA/NUMA, or GPU interconnect ...

Member of Technical Staff

Hiring Organisation
Geometric
Location
Ireland, UK
implementation level. Attention variants, KV cache strategies, quantisation schemes, and how they shape kernel design You've worked with production inference or training frameworks, vLLM, Megatron-LM, etc. You've built performance-critical infrastructure before - compilers, profilers, auto-tuners, or search systems You have real intuition for evolutionary methods, fitness … work of François Chollet, Kenneth Stanley, Jeff Clune, Jurgen Schmidhuber, David Ha, and Christian Szegedy Bonus Open-source kernel contributions (FlashAttention, FlashInfer, vLLM, Unsloth, Liger-Kernels, ThunderKittens) Publications in ML/AI, kernel optimisation or evolutionary methods (NeurIPS, ICLR, CVPR, GECCO or equivalent) Other HW experience (AMD, MLX, edge ...

Member of Technical Staff

Hiring Organisation
Geometric
Location
City of London, London, United Kingdom
implementation level. Attention variants, KV cache strategies, quantisation schemes, and how they shape kernel design You've worked with production inference or training frameworks, vLLM, Megatron-LM, etc. You've built performance-critical infrastructure before - compilers, profilers, auto-tuners, or search systems You have real intuition for evolutionary methods, fitness … work of François Chollet, Kenneth Stanley, Jeff Clune, Jurgen Schmidhuber, David Ha, and Christian Szegedy Bonus Open-source kernel contributions (FlashAttention, FlashInfer, vLLM, Unsloth, Liger-Kernels, ThunderKittens) Publications in ML/AI, kernel optimisation or evolutionary methods (NeurIPS, ICLR, CVPR, GECCO or equivalent) Other HW experience (AMD, MLX, edge ...