SOC 2). Collaborate with ML/AI Teams Package and deploy large‑language‑model (LLM) training jobs on distributed GPU clusters (Slurm, Ray, Kubeflow, or AWS SageMaker). Optimize model‑serving (Triton, vLLM, TorchServe) for low‑latency, high‑throughput inference. Cost & Performance Optimization Track cloud spend, right‑size resources More ❯
SOC 2). Collaborate with ML/AI Teams Package and deploy large‑language‑model (LLM) training jobs on distributed GPU clusters (Slurm, Ray, Kubeflow, or AWS SageMaker). Optimize model‑serving (Triton, vLLM, TorchServe) for low‑latency, high‑throughput inference. Cost & Performance Optimization Track cloud spend, right‑size resources More ❯
SOC 2). Collaborate with ML/AI Teams Package and deploy large‑language‑model (LLM) training jobs on distributed GPU clusters (Slurm, Ray, Kubeflow, or AWS SageMaker). Optimize model‑serving (Triton, vLLM, TorchServe) for low‑latency, high‑throughput inference. Cost & Performance Optimization Track cloud spend, right‑size resources More ❯
SOC 2). Collaborate with ML/AI Teams Package and deploy large‑language‑model (LLM) training jobs on distributed GPU clusters (Slurm, Ray, Kubeflow, or AWS SageMaker). Optimize model‑serving (Triton, vLLM, TorchServe) for low‑latency, high‑throughput inference. Cost & Performance Optimization Track cloud spend, right‑size resources More ❯
SOC 2). Collaborate with ML/AI Teams Package and deploy large‑language‑model (LLM) training jobs on distributed GPU clusters (Slurm, Ray, Kubeflow, or AWS SageMaker). Optimize model‑serving (Triton, vLLM, TorchServe) for low‑latency, high‑throughput inference. Cost & Performance Optimization Track cloud spend, right‑size resources More ❯
working in the areas of ML and causal inference for downstream impact estimation. The ideal candidate will have knowledge of at least one of ray, spark or rapidsai framework to accelerate model training. A background in causal inference (e.g. Double ML) is a plus but not required. This is the More ❯
productionizing AI solutions. Expertise in Python and key libraries (e.g., NumPy, SciPy, Pydantic, Asyncio). Hands-on experience with open-source libraries (e.g. LangChain, Ray, PyTorch, Lightning). Hands-on experience with machine learning techniques, including Reinforcement Learning. Proficiency in AI/ML frameworks and libraries such as PyTorch, TensorFlow More ❯
Washington, Washington DC, United States Hybrid / WFH Options
RAND Corporation
the AI/ML hardware stack (e.g. GPUs, TPUs, data center design) Familiarity with the AI/ML software stack (e.g. CUDA, PyTorch, TensorFlow, Ray) Experience working on AI research, ML model training, or model deployment Experience with securing AI systems Education Requirements RAND is hiring a Research Lead at More ❯
Washington, Washington DC, United States Hybrid / WFH Options
RAND Corporation
the AI/ML hardware stack (e.g., GPUs, TPUs, data center design) Familiarity with the AI/ML software stack (e.g., CUDA, PyTorch, TensorFlow, Ray) Experience working on AI research, ML model training, or model deployment Experience with securing AI systems Education Requirements RAND is hiring multiple Visiting AI Security More ❯
or business challenges into well-defined machine learning solutions We are using many technologies day to day such as various AWS services, GCP, Kubernetes, Ray Serve, Kubeflow, and ReTool. Any experience in these areas would be a bonus Sprout.ai Values Hungry for Growth - Unleash your inner Sprout: Sprouts embrace growth More ❯
to architectural decisions. What We’re Looking For: Strong Python programming skills (5+ years preferred). Deep experience with distributed systems (e.g., Kafka, Spark, Ray, Kubernetes). Hands-on work with big data technologies and architectures. Solid understanding of concurrency, fault tolerance, and data consistency. Comfortable in a fast-paced More ❯
to architectural decisions. What We’re Looking For: Strong Python programming skills (5+ years preferred). Deep experience with distributed systems (e.g., Kafka, Spark, Ray, Kubernetes). Hands-on work with big data technologies and architectures. Solid understanding of concurrency, fault tolerance, and data consistency. Comfortable in a fast-paced More ❯
london, south east england, United Kingdom Hybrid / WFH Options
Oliver Bernard
to architectural decisions. What We’re Looking For: Strong Python programming skills (5+ years preferred). Deep experience with distributed systems (e.g., Kafka, Spark, Ray, Kubernetes). Hands-on work with big data technologies and architectures. Solid understanding of concurrency, fault tolerance, and data consistency. Comfortable in a fast-paced More ❯
the pipeline. Collaborate with research to define data quality benchmarks . Optimize end-to-end performance across distributed data processing frameworks (e.g., Apache Spark, Ray, Airflow). Work with infrastructure teams to scale pipelines across thousands of GPUs . Work directly with the leadership on the data team roadmaps. Manage More ❯
language processing, image recognition, semantic segmentation, reinforcement learning, approaches such as Bayesian, deep convolutional and graph neural network methods, and tools such as PyTorch, Ray, TensorFlow/board, and MLflow Interacting with decision-makers and customers to translate mission needs into an end-to-end analytical solution Ability to apply More ❯
language processing, image recognition, semantic segmentation, reinforcement learning, approaches such as Bayesian, deep convolutional and graph neural network methods, and tools such as PyTorch, Ray, TensorFlow/board, and MLflow Interacting with decision-makers and customers to translate mission needs into an end-to-end solution Ability to apply novel More ❯
to some of the biggest names in the insurance industry. We are developing a modern real-time ML platform using technologies like Python, PyTorch, Ray, k8s (helm + flux), Terraform, Postgres and Flink on AWS. We are very big fans of Infrastructure-as-Code and enjoy Agile practices. As a More ❯
navigating hybrid infrastructure : some workloads will be on-prem, others cloud (large GPU clusters). Familiarity with distributed systems and container orchestration (e.g., Kubernetes, Ray). Experience working client-facing or in cross-functional teams — ideally within pharma/life sciences .1 A “get stuck in” attitude — this is a More ❯
navigating hybrid infrastructure : some workloads will be on-prem, others cloud (large GPU clusters). Familiarity with distributed systems and container orchestration (e.g., Kubernetes, Ray). Experience working client-facing or in cross-functional teams — ideally within pharma/life sciences .1 A “get stuck in” attitude — this is a More ❯
navigating hybrid infrastructure : some workloads will be on-prem, others cloud (large GPU clusters). Familiarity with distributed systems and container orchestration (e.g., Kubernetes, Ray). Experience working client-facing or in cross-functional teams — ideally within pharma/life sciences .1 A “get stuck in” attitude — this is a More ❯
navigating hybrid infrastructure : some workloads will be on-prem, others cloud (large GPU clusters). Familiarity with distributed systems and container orchestration (e.g., Kubernetes, Ray). Experience working client-facing or in cross-functional teams — ideally within pharma/life sciences .1 A “get stuck in” attitude — this is a More ❯
navigating hybrid infrastructure : some workloads will be on-prem, others cloud (large GPU clusters). Familiarity with distributed systems and container orchestration (e.g., Kubernetes, Ray). Experience working client-facing or in cross-functional teams — ideally within pharma/life sciences .1 A “get stuck in” attitude — this is a More ❯
navigating hybrid infrastructure: some workloads will be on-prem, others cloud (large GPU clusters). Familiarity with distributed systems and container orchestration (e.g., Kubernetes, Ray). Experience working client-facing or in cross-functional teams — ideally within pharma/life sciences.1 A “get stuck in” attitude — this is a team More ❯
Stack Our client is tech-agnostic and values adaptability. Current tools include: Backend : Python Frontend : TypeScript, React Infrastructure : Kubernetes, GCP Machine Learning : PyTorch, CUDA, Ray What’s on Offer Highly competitive base salary + commission + equity in a hyper-growth company 25 days holiday + public holidays Dynamic office More ❯
Stack Our client is tech-agnostic and values adaptability. Current tools include: Backend : Python Frontend : TypeScript, React Infrastructure : Kubernetes, GCP Machine Learning : PyTorch, CUDA, Ray What’s on Offer Highly competitive base salary + commission + equity in a hyper-growth company 25 days holiday + public holidays Dynamic office More ❯