SOC 2). Collaborate with ML/AI Teams Package and deploy large‐language‐model (LLM) training jobs on distributed GPU clusters (Slurm, Ray, Kubeflow, or AWS SageMaker). Optimize model‐serving (Triton, vLLM, TorchServe) for low‐latency, high‐throughput inference. Cost & Performance Optimization Track cloud spend, right‐size resources More ❯
SOC 2). Collaborate with ML/AI Teams Package and deploy large‐language‐model (LLM) training jobs on distributed GPU clusters (Slurm, Ray, Kubeflow, or AWS SageMaker). Optimize model‐serving (Triton, vLLM, TorchServe) for low‐latency, high‐throughput inference. Cost & Performance Optimization Track cloud spend, right‐size resources More ❯
SOC 2). Collaborate with ML/AI Teams Package and deploy large‐language‐model (LLM) training jobs on distributed GPU clusters (Slurm, Ray, Kubeflow, or AWS SageMaker). Optimize model‐serving (Triton, vLLM, TorchServe) for low‐latency, high‐throughput inference. Cost & Performance Optimization Track cloud spend, right‐size resources More ❯
working in the areas of ML and causal inference for downstream impact estimation. The ideal candidate will have knowledge of at least one of ray, spark or rapidsai framework to accelerate model training. A background in causal inference (e.g. Double ML) is a plus but not required. This is the More ❯
productionizing AI solutions. Expertise in Python and key libraries (e.g., NumPy, SciPy, Pydantic, Asyncio). Hands-on experience with open-source libraries (e.g. LangChain, Ray, PyTorch, Lightning). Hands-on experience with machine learning techniques, including Reinforcement Learning. Proficiency in AI/ML frameworks and libraries such as PyTorch, TensorFlow More ❯
Washington, Washington DC, United States Hybrid / WFH Options
RAND Corporation
the AI/ML hardware stack (e.g. GPUs, TPUs, data center design) Familiarity with the AI/ML software stack (e.g. CUDA, PyTorch, TensorFlow, Ray) Experience working on AI research, ML model training, or model deployment Experience with securing AI systems Education Requirements RAND is hiring a Research Lead at More ❯
Washington, Washington DC, United States Hybrid / WFH Options
RAND Corporation
the AI/ML hardware stack (e.g., GPUs, TPUs, data center design) Familiarity with the AI/ML software stack (e.g., CUDA, PyTorch, TensorFlow, Ray) Experience working on AI research, ML model training, or model deployment Experience with securing AI systems Education Requirements RAND is hiring multiple Visiting AI Security More ❯
or business challenges into well-defined machine learning solutions We are using many technologies day to day such as various AWS services, GCP, Kubernetes, Ray Serve, Kubeflow, and ReTool. Any experience in these areas would be a bonus Sprout.ai Values Hungry for Growth - Unleash your inner Sprout: Sprouts embrace growth More ❯
the pipeline. Collaborate with research to define data quality benchmarks . Optimize end-to-end performance across distributed data processing frameworks (e.g., Apache Spark, Ray, Airflow). Work with infrastructure teams to scale pipelines across thousands of GPUs . Work directly with the leadership on the data team roadmaps. Manage More ❯
language processing, image recognition, semantic segmentation, reinforcement learning, approaches such as Bayesian, deep convolutional and graph neural network methods, and tools such as PyTorch, Ray, TensorFlow/board, and MLflow Interacting with decision-makers and customers to translate mission needs into an end-to-end analytical solution Ability to apply More ❯
language processing, image recognition, semantic segmentation, reinforcement learning, approaches such as Bayesian, deep convolutional and graph neural network methods, and tools such as PyTorch, Ray, TensorFlow/board, and MLflow Interacting with decision-makers and customers to translate mission needs into an end-to-end solution Ability to apply novel More ❯
to some of the biggest names in the insurance industry. We are developing a modern real-time ML platform using technologies like Python, PyTorch, Ray, k8s (helm + flux), Terraform, Postgres and Flink on AWS. We are very big fans of Infrastructure-as-Code and enjoy Agile practices. As a More ❯
navigating hybrid infrastructure : some workloads will be on-prem, others cloud (large GPU clusters). Familiarity with distributed systems and container orchestration (e.g., Kubernetes, Ray). Experience working client-facing or in cross-functional teams — ideally within pharma/life sciences .1 A “get stuck in” attitude — this is a More ❯
navigating hybrid infrastructure : some workloads will be on-prem, others cloud (large GPU clusters). Familiarity with distributed systems and container orchestration (e.g., Kubernetes, Ray). Experience working client-facing or in cross-functional teams — ideally within pharma/life sciences .1 A “get stuck in” attitude — this is a More ❯
navigating hybrid infrastructure : some workloads will be on-prem, others cloud (large GPU clusters). Familiarity with distributed systems and container orchestration (e.g., Kubernetes, Ray). Experience working client-facing or in cross-functional teams — ideally within pharma/life sciences .1 A “get stuck in” attitude — this is a More ❯
Stack Our client is tech-agnostic and values adaptability. Current tools include: Backend : Python Frontend : TypeScript, React Infrastructure : Kubernetes, GCP Machine Learning : PyTorch, CUDA, Ray What’s on Offer Highly competitive base salary + commission + equity in a hyper-growth company 25 days holiday + public holidays Dynamic office More ❯
Stack Our client is tech-agnostic and values adaptability. Current tools include: Backend : Python Frontend : TypeScript, React Infrastructure : Kubernetes, GCP Machine Learning : PyTorch, CUDA, Ray What’s on Offer Highly competitive base salary + commission + equity in a hyper-growth company 25 days holiday + public holidays Dynamic office More ❯
breakdown of all the technologies we use: Backend: Python Frontend: Typescript and React Kubernetes for deployment GCP for underlying infrastructure Machine Learning: PyTorch, CUDA, Ray We encourage people from all backgrounds, cultures, and skill levels to apply. It is okay to not meet all requirements listed as we are looking More ❯
experience in benchmarking foundational models for real-world applications. Experience in machine learning and developing AI models in frameworks such as Pytorch, TensorFlow, FSDP, Ray, and so forth. Expertise in one or more AI areas, including transfer learning, model distillation, surrogate models, and reinforcement learning. Research experience in designing and More ❯
in Python and related ML frameworks such as JAX, Pytorch and XLA/MLIR. Experience with distributed training infrastructures (Kubernetes, Slurm) and associated frameworks (Ray). Experience using large-scale distributed training strategies. Hands on experience on training large model at scale and having contributed to the tooling and/ More ❯
Galytix (GX) is delivering on the promise of AI. GX has built specialised knowledge AI assistants for the banking and insurance industry. Our assistants are fed by sector-specific data and knowledge and easily adaptable through ontology layers to reflect More ❯
closely with AI/ML teams to structure and store data effectively. The ideal candidate will have experience with Postgres (UDFs, SQL Triggers), Spark, Ray, ElasticSearch, Python, AWS S3, and structured data formats (Parquet, Avro, JSON Schema). This hybrid role requires candidates to be local to Maryland, Northern Virginia … for scripting and data manipulation. Familiarity with other languages such as Java and Scala would be beneficial. Experience with Hadoop, Spark/PySpark and Ray for large-scale data processing. Hands-on expertise with ElasticSearch and/or SOLR for indexing and retrieval of massive datasets. Familiarity with HBase, Accumulo More ❯
learning research and production. High-Performance Data Pipelines Develop and optimize distributed systems for data processing, including filtering, indexing, and retrieval, leveraging frameworks like Ray, Metaflow, Spark, or Hadoop. Synthetic Data Generation Build and orchestrate pipelines to generate synthetic data at scale, advancing research on cost-efficient inference and training … rendering engines, and/or other softwares. Distributed Computing & MLOps Demonstrated proficiency in setting up large-scale, robust data pipelines, using frameworks like Spark, Ray, or Metaflow. Comfortable with model versioning, and experiment tracking. Performance Optimization Good understanding of parallel and distributed computing. Experienced with setting up evaluation methods Cloud More ❯
Salt is proud to be partnered with a leading US-based technology company on the lookout for a talented Senior Software Developer to join their team. You'll be contributing to a large-scale analytics platform, building tailored SLA reporting More ❯