1 to 25 of 33 vLLM Jobs

Principal AI/ML Engineer

Hiring Organisation
PRAGMATIKE
Location
Washington, Washington DC, United States
Employment Type
Permanent
Salary
USD Annual
infrastructure (AWS, GCP, or Azure). Proficiency with Python and familiarity with TypeScript or Go for platform integration. Expertise in ML frameworks: PyTorch, Transformers, vLLM, Llama-factory, Megatron-LM, CUDA/GPU acceleration (practical understanding) Strong experience with containerization and orchestration (Docker, Kubernetes, Helm, autoscaling). Deep understanding ...

Staff / Principal ML Ops Engineer

Hiring Organisation
PRAGMATIKE
Location
Washington, Washington DC, United States
Employment Type
Permanent
Salary
USD Annual
infrastructure (AWS, GCP, or Azure). Proficiency with Python and familiarity with TypeScript or Go for platform integration. Expertise in ML frameworks: PyTorch, Transformers, vLLM, Llama-factory, Megatron-LM, CUDA/GPU acceleration (practical understanding) Strong experience with containerization and orchestration (Docker, Kubernetes, Helm, autoscaling). Deep understanding ...

Staff / Principal ML Ops Engineer

Hiring Organisation
PRAGMATIKE
Location
Boston, Massachusetts, United States
Employment Type
Permanent
Salary
USD Annual
infrastructure (AWS, GCP, or Azure). Proficiency with Python and familiarity with TypeScript or Go for platform integration. Expertise in ML frameworks: PyTorch, Transformers, vLLM, Llama-factory, Megatron-LM, CUDA/GPU acceleration (practical understanding) Strong experience with containerization and orchestration (Docker, Kubernetes, Helm, autoscaling). Deep understanding ...

Principal AI/ML Engineer

Hiring Organisation
PRAGMATIKE
Location
Boston, Massachusetts, United States
Employment Type
Permanent
Salary
USD Annual
infrastructure (AWS, GCP, or Azure). Proficiency with Python and familiarity with TypeScript or Go for platform integration. Expertise in ML frameworks: PyTorch, Transformers, vLLM, Llama-factory, Megatron-LM, CUDA/GPU acceleration (practical understanding) Strong experience with containerization and orchestration (Docker, Kubernetes, Helm, autoscaling). Deep understanding ...

Senior AI Platform Engineer

Hiring Organisation
Klaviyo
Location
Boston, Massachusetts, United States
Employment Type
Permanent
Salary
USD Annual
Build tools to tune LLM-based features, deploy agents, MCP and RAG in production, and evaluate performance using tools like Arize, OpenAI, Bedrock, LangChain, vLLM and Ray. Develop reliable, scalable data pipelines and APIs for AI systems. Foster a culture of ownership, experimentation, and customer-first thinking. ...

Senior AI Engineer

Hiring Organisation
Aveni
Location
United Kingdom
Experience working with cloud environments (preferably AWS) Nice to have Experience with containerisation technologies such as Docker or Kubernetes Experience with frameworks such as vLLM or NeMo Knowledge of financial services NLP applications Experience designing evaluation methodologies for LLM outputs Experience building intelligent agents or multi-agent systems Skills ...

Principal MLOps Engineer

Hiring Organisation
Raft Company Website
Location
Boston, Massachusetts, United States
Employment Type
Permanent
Salary
USD Annual
first 90 days of employment Highly preferred: Experience with ML model serving and inference platforms such as Triton Inference Server, KServe, Ray Serve, vLLM, or similar technologies Experience with secure and compliant deployment practices in regulated or government environments Experience with Kubernetes-based ML platforms such as Kubeflow Familiarity with ...

GEN AI Engineer

Hiring Organisation
Nastech Global
Location
Charlotte, North Carolina, United States
Employment Type
Permanent
Salary
USD Annual
experience with the Langchain frameworko Experience specifically with the Open AI API, chat completions, embeddings,etco Have a solid awareness on TensorRT and VLLM implementation.o Strong proficiency in python and data Science libraries(NumPy,Pandas,scikit-learn,PyTorch/TensorFlow)o Proven experience applying guardrails and observability ...

AI / ML Engineer (WISRD Platform - ISR & Tactical Edge AI)

Hiring Organisation
CyOne, Inc
Location
Washington, Washington DC, United States
Employment Type
Permanent
Salary
USD Annual
APIs and streaming technologies Infrastructure & Tools Experience with Docker and containerized deployments Familiarity with Kubernetes and distributed systems Experience with model serving frameworks (e.g., vLLM, TGI, Ollama) Experience with GPU-based compute environments Experience 3+ years of experience in AI/ML engineering or related field Experience deploying AI/ ...

Superb BackEnd Software Engineer (2+ years - Senior Level)

Hiring Organisation
Black Cape
Location
Arlington, Virginia, United States
Employment Type
Permanent
Salary
USD Annual
transformers, and CoreNLP Dependency management tools such as poetry, uv, Pipenv, or maven Large language models (LLMs) and servers, such as Ollama, Llama.cpp, and vLLM Great Benefits 401K Competitive Salary Generous Time Off Exceptional Team-Building & Fun Company Events Mentorship & Professional Development Programs Tuition Reimbursement CONTACT: to learn MORE Powered ...

Software Engineer

Hiring Organisation
career
Location
San Francisco, California, United States
Employment Type
Permanent
Salary
USD Annual
develop reliable, scalable and high quality code. Valuable/Bonus Qualifications: Experience with ML frameworks like PyTorch. Experience with serving ML models and LLMs (vLLM). Experience collaborating with Machine Learning (ML) or data teams. Familiarity with ML concepts, including inference, model serving, latency considerations, and data pipelines. Experience utilizing ...

Senior Machine Learning Engineer (LLMs)

Hiring Organisation
Albi
Location
Chicago, Illinois, United States
Employment Type
Permanent
Salary
USD Annual
owning projects end to end and mentoring other engineers Nice to have: Distributed training (FSDP, DeepSpeed, Megatron, etc.) Inference optimization (quantization, speculative decoding, vLLM, Triton) Experience shipping LLM features in production SaaS Open source contributions or published work or patents in ML/NLP Microsoft Foundry experience Benefits Competitive salary ...

Staff Software Engineer, AI Model LifeCycle

Hiring Organisation
Crusoe
Location
Sunnyvale, California, United States
Employment Type
Permanent
Salary
USD Annual
problems. Bonus Points: Proficiency in Golang or Python for large-scale, production-level services and PyTorch. Contributions to open-source AI projects such as vLLM or similar frameworks. Performance optimizations on GPU systems and inference frameworks. Benefits: Competitive compensation Restricted Stock Units Paid time off & paid holidays Comprehensive health, dental ...

Staff Software Engineer, AI Model LifeCycle

Hiring Organisation
Crusoe
Location
San Francisco, California, United States
Employment Type
Permanent
Salary
USD Annual
problems. Bonus Points: Proficiency in Golang or Python for large-scale, production-level services and PyTorch. Contributions to open-source AI projects such as vLLM or similar frameworks. Performance optimizations on GPU systems and inference frameworks. Benefits: Competitive compensation Restricted Stock Units Paid time off & paid holidays Comprehensive health, dental ...

Senior Software Engineer, AI Model LifeCycle

Hiring Organisation
Crusoe
Location
Sunnyvale, California, United States
Employment Type
Permanent
Salary
USD Annual
problems. Bonus Points: Proficiency in Golang or Python for large-scale, production-level services and PyTorch Contributions to open-source AI projects such as vLLM or similar frameworks. Performance optimizations on GPU systems and inference frameworks. Benefits: Competitive compensation Restricted Stock Units Paid time off & paid holidays Comprehensive health, dental ...

Staff Software Engineer - Backend & AI Infra - Trading

Hiring Organisation
Career Renew
Location
Washington, Washington DC, United States
Employment Type
Permanent
Salary
USD Annual
built it, deployed it, operated it, fixed it at 3am Strong Plus Experience with model serving/LLM infrastructure - deploying, scaling, and optimizing inference (vLLM, TGI, TensorRT-LLM, or managed endpoints) Background in trading systems, exchange APIs, or fintech where uptime has direct financial consequences Experience with onchain infrastructure : wallet ...

Senior Software Engineer - Data Lake & BI

Hiring Organisation
CoreWeave
Location
Sunnyvale, California, United States
Employment Type
Permanent
Salary
USD Annual
pipelines. Experience running MLPerf submissions or similar large-scale audited benchmarks. Contributions to OSS projects such as Apache Iceberg, Apache Spark, Trino, llm-d, vLLM, or PyTorch. Exposure to benchmarking large GPU fleets or multi-region clusters. Experience with CUDA kernels, NCCL/SHARP, RDMA/NUMA, or GPU interconnect ...

Senior Machine Learning Engineer - Scene Understanding

Hiring Organisation
Zoox
Location
Boston, Massachusetts, United States
Employment Type
Permanent
Salary
USD Annual
Hands-on experience with production ML pipelines, including dataset creation, training frameworks, and metrics Expertise in Python libraries (PyTorch, NumPy, Pandas, VLLM) Bonus Qualifications Deep knowledge of cutting-edge computer vision techniques Publications in top-tier conferences (CVPR, ICCV, RSS, ICRA) Experience with integrating large language models to various tasks. ...

Staff / Principal Machine Learning Engineer, Serving

Hiring Organisation
Inworld AI
Location
United Kingdom
need all of this. But you need enough to make a case. Inference Optimization. Deep understanding of modern serving frameworks and techniques like vLLM or TRT-LLM. Model Acceleration . Hands-on experience with quantization, distillation, caching strategies , continuous batching, paged attention, and speculative decoding. High-Performance Systems. Proficiency ...

AI Platform and Applications Engineer

Hiring Organisation
National Endowment for Dem
Location
Washington, Washington DC, United States
Employment Type
Permanent
Salary
USD Annual
agentic workflows including document processing, chunking, embeddings, vector search, relevance tuning, and grounding strategies Experience with model access layers or inference runtimes such as vLLM, TGI, llama.cpp, Hugging Face Transformers, or similar tools Familiarity with containerized deployment and modern engineering operations, including Git, Docker, CI/CD, and application monitoring ...

Software Engineer, Inference AI/ML

Hiring Organisation
CoreWeave
Location
Sunnyvale, California, United States
Employment Type
Permanent
Salary
USD Annual
from experienced engineers. About the role: Implement well-scoped features and fixes in Python/Go/C++ for model-serving services (e.g., Triton, vLLM, TensorRT-LLM, Ray Serve). Write tests, code comments, and short design docs; participate in code reviews. Add basic metrics and dashboards; assist with alarms ...

Platform Engineer

Hiring Organisation
Zyphra
Location
San Francisco, California, United States
Employment Type
Permanent
Salary
USD Annual
Ansible, Terraform) Prior work supporting ML/AI infrastructure, including GPU management and workload optimization Exposure to backend development for ML model serving (i.e., vLLM, Ray, SGLang, Triton) Why Work at Zyphra: Our research methodology is grounded in methodical, step-by-step approaches to ambitious goals. Both deep research ...

Staff AI Engineer

Hiring Organisation
Career Renew
Location
Washington, Washington DC, United States
Employment Type
Permanent
Salary
USD Annual
time Strong Plus Experience with financial ML - signal generation, alpha research, portfolio optimization, or execution optimization LLM fine-tuning and serving - PEFT/LoRA, vLLM, TGI, or custom inference pipelines in production Multi-agent systems - designing systems where autonomous agents coordinate, compete, or learn from each other Onchain data ...

AI Systems Research Engineer

Hiring Organisation
microTECH Global LTD
Location
Edinburgh, Scotland, United Kingdom
Strong knowledge of distributed systems, operating systems, machine learning systems architecture, Inference serving, and AI Infrastructure. · Hands-on experience with LLM serving frameworks (e.g., vLLM, Ray Serve, TensorRT-LLM, TGI) and distributed KV cache optimization. · Proficiency in C/C++, with additional experience in Python for research prototyping. · Solid grounding ...

Architecture Intern - Inference

Hiring Organisation
Etched
Location
San Jose, California, United States
Employment Type
Permanent
Salary
USD Annual
InfiniBand). Ported applications to non-standard accelerator hardware or hardware platforms. Deep knowledge of transformer model architectures and/or inference serving stacks (vLLM, SGLang, etc.) Strong candidates may have some experience with Proficiency in Rust Low-latency, high-performance applications using both kernel-level and user-space networking ...