paced, collaborative, and dynamic environment. Nice to haves: Prior experience with PCB design, EDA tools, or related optimization problems. Hands-on experience in high-performance computing environments (e.g., Kubernetes, Ray, Dask). Contributions to open-source projects, publications, or top placements in ML competitions (e.g., Kaggle). Expertise in related fields such as Computer Vision, Representation Learning, or Simulation Environments. More ❯
Mountain View, California, United States Hybrid / WFH Options
LinkedIn
power of our GPU fleet with thousands of latest GPU cards. The team also works closely with the open source community and has many open source committers (TensorFlow, Horovod, Ray, vLLM, Hugginface, DeepSpeed etc.) in the team. Additionally, this team focussed on technologies like LLMs, GNNs, Incremental Learning, Online Learning and Serving performance optimizations across billions of user queries. Model More ❯
London, England, United Kingdom Hybrid / WFH Options
Merantix
GCP Containerization technologies, such as Docker and Kubernetes Documenting code, architectures, and experiments Linux systems and bash terminals Preferred Qualifications Hands-on experience with: Distributed computing frameworks, such as Ray Data and Spark. Databases and/or data warehousing technologies, such as Apache Hive. Data transformation via SQL and DBT. Orchestration platforms, such as Apache Airflow. Data catalogs and metadata More ❯
City of London, England, United Kingdom Hybrid / WFH Options
uk.tiptopjob.com - Jobboard
margin-top:0.0px;text-align:justify"- -Programming Skills: Proficiency in Python, data analytics, deep learning (Scikit-learn, Pandas, PyTorch, Jupyter, pipelines), and practical knowledge of data tools like Databricks, Ray, Vector Databases, Kubernetes, and workflow scheduling tools such as Apache Airflow, Dagster, and Astronomer. -GPU Computing: Familiarity with GPU computing, both on-premises and on cloud platforms, and experience in More ❯
London, England, United Kingdom Hybrid / WFH Options
InstaDeep Ltd
Pallas, Triton, and/or CUDA code to achieve performance breakthroughs. Required Skills Understanding of Linux systems, performance analysis tools, and hardware optimisation techniques Experience with distributed training frameworks (Ray, Dask, PyTorch Lightning, etc.) Expertise with Python and/or C/C++ Development with machine learning frameworks (JAX, Tensorflow, PyTorch etc.) Passion for profiling, identifying bottlenecks, and delivering efficient More ❯
skills we are searching for are: Programming Skills: Proficiency in Python, data analytics, deep learning (Scikit-learn, Pandas, PyTorch, Jupyter, pipelines), and practical knowledge of data tools like Databricks, Ray, Vector Databases, Kubernetes, and workflow scheduling tools such as Apache Airflow, Dagster, and Astronomer. GPU Computing: Familiarity with GPU computing, both on-premises and on cloud platforms, and experience in More ❯
Washington, Washington DC, United States Hybrid / WFH Options
RAND Corporation
on security topics Familiarity with the AI/ML hardware stack (e.g. GPUs, TPUs, data center design) Familiarity with the AI/ML software stack (e.g. CUDA, PyTorch, TensorFlow, Ray) Experience working on AI research, ML model training, or model deployment Experience with securing AI systems Education Requirements RAND is hiring a Research Lead at either the specialist or expert More ❯
Washington, Washington DC, United States Hybrid / WFH Options
RAND Corporation
on security topics Familiarity with the AI/ML hardware stack (e.g., GPUs, TPUs, data center design) Familiarity with the AI/ML software stack (e.g., CUDA, PyTorch, TensorFlow, Ray) Experience working on AI research, ML model training, or model deployment Experience with securing AI systems Education Requirements RAND is hiring multiple Visiting AI Security Residents at associate, specialist, and More ❯
quality datasets suitable for machine learning research and production. High-Performance Data Pipelines Develop and optimize distributed systems for data processing, including filtering, indexing, and retrieval, leveraging frameworks like Ray, Metaflow, Spark, or Hadoop. Synthetic Data Generation Build and orchestrate pipelines to generate synthetic data at scale, advancing research on cost-efficient inference and training strategies. Experiments & Analysis Design and … inference of pretrained models, 3D rendering engines, and/or other softwares. Distributed Computing & MLOps Demonstrated proficiency in setting up large-scale, robust data pipelines, using frameworks like Spark, Ray, or Metaflow. Comfortable with model versioning, and experiment tracking. Performance Optimization Good understanding of parallel and distributed computing. Experienced with setting up evaluation methods Cloud & Storage Systems Experience with AWS More ❯
high-quality data flows through the pipeline. Collaborate with research to define data quality benchmarks . Optimize end-to-end performance across distributed data processing frameworks (e.g., Apache Spark, Ray, Airflow). Work with infrastructure teams to scale pipelines across thousands of GPUs . Work directly with the leadership on the data team roadmaps. Manage the team of data engineers. More ❯
Ability to convert customer requirements or business challenges into well-defined machine learning solutions We are using many technologies day to day such as various AWS services, GCP, Kubernetes, Ray Serve, Kubeflow, and ReTool. Any experience in these areas would be a bonus Sprout.ai Values Hungry for Growth - Unleash your inner Sprout: Sprouts embrace growth, forget comfort zones, and help More ❯
London, England, United Kingdom Hybrid / WFH Options
Sprout.ai
Ability to convert customer requirements or business challenges into well-defined machine learning solutions We are using many technologies day to day such as various AWS services, GCP, Kubernetes, Ray Serve, Kubeflow, and ReTool. Any experience in these areas would be a bonus Sprout.ai Values Hungry for Growth - Unleash your inner Sprout: Sprouts embrace growth, forget comfort zones, and help More ❯
London, England, United Kingdom Hybrid / WFH Options
Autodesk
bash terminals Knowledge of cloud architectures and networking Experience with Documenting code, architectures, and experiments Relational databases (e.g., MySQL, PostgreSQL) and NoSQL databases (e.g., MongoDB, Cassandra) Frameworks such as Ray data, Metaflow, Hadoop, Spark, or Hive Preferred Qualifications Experience with computational geometry such as mesh or boundary representation data processing. Experience with CAD model search and retrieval, in PLM systems … and networking Experience with Cloud services & architectures (AWS, Azure, etc.) Documenting code, architectures, and experiments Relational databases (e.g., MySQL, PostgreSQL) and NoSQL databases (e.g., MongoDB, Cassandra) Frameworks such as Ray data, Metaflow, Hadoop, Spark, or Hive Vector data stores Preferred Qualifications Experience with computational geometry such as mesh or boundary representation data processing. Experience with CAD model search and retrieval More ❯
London, England, United Kingdom Hybrid / WFH Options
Autodesk
including 2D and 3D geometry. You have experience with: Documenting code, architectures, and experiments. Relational databases (e.g., MySQL, PostgreSQL) and NoSQL databases (e.g., MongoDB, Cassandra). Frameworks such as Ray data, Metaflow, Hadoop, Spark, or Hive. Proficiency with Linux systems and bash terminals. Knowledge of cloud architectures and networking Additional Qualifications Experience with computational geometry such as mesh or boundary … experience with: Cloud services & architectures (AWS, Azure, etc.). Documenting code, architectures, and experiments. Relational databases (e.g., MySQL, PostgreSQL) and NoSQL databases (e.g., MongoDB, Cassandra). Frameworks such as Ray data, Metaflow, Hadoop, Spark, or Hive. Vector data stores. Proficiency with Linux systems and bash terminals. Knowledge of cloud architectures and networking Additional Qualifications Experience with computational geometry such as More ❯
from papers and blogs, conduct experiments, and propose system improvements. Excellent communication skills in English, both written and verbal. Preferred Skills Have experiences on working with distributed HPC clusters: Ray, Kubernetes, Docker. Experience with fine-tuning LLM/Embed for downstream tasks. Experience with building, evaluating RAG systems. Selected Technical Skills: Artificial Intelligence, Python English mandatory Additional Information What do More ❯
experiments Experience with ML model monitoring systems Experience with ML training and data pipelines and working with distributed systems Proficiency with modern deep learning libraries and frameworks (PyTorch, Lightning, Ray) Preferred Qualifications Experience owning a product from development through monitoring and incident response Knowledge of the design, manufacturing, AEC, or media & entertainment industries Experience with Autodesk or similar products (CAD More ❯
City of London, London, United Kingdom Hybrid / WFH Options
Stanford Black Limited
fit share your CV! Role: Architect and optimise large-scale compute-intensive workloads spanning significant numbers of nodes and concurrent tasks Design, build, and manage systems with tools like Ray and YellowDog Optimise application performance on distributed platforms Provide architectural guidance on distributed computing design and development Drive efficiency and scalability across the platform, with a focus on ML pipeline … Job/Resource scheduling experience i.e. Yellowdog Cloud platform proficiency (any provider) Experience with large scale systems (1k+ Nodes, 10k+ tasks) Experience monitoring/troubleshooting a distributed environment Advance Ray experience for ML pipelines, tuning, distributed execution Python and Conda proficiency Docker + Kubernetes experience Knowledge of networking (TCP/IP, UDP/IP, LAN/WAN) Identify and access More ❯
fit share your CV! Role: Architect and optimise large-scale compute-intensive workloads spanning significant numbers of nodes and concurrent tasks Design, build, and manage systems with tools like Ray and YellowDog Optimise application performance on distributed platforms Provide architectural guidance on distributed computing design and development Drive efficiency and scalability across the platform, with a focus on ML pipeline … Job/Resource scheduling experience i.e. Yellowdog Cloud platform proficiency (any provider) Experience with large scale systems (1k+ Nodes, 10k+ tasks) Experience monitoring/troubleshooting a distributed environment Advance Ray experience for ML pipelines, tuning, distributed execution Python and Conda proficiency Docker + Kubernetes experience Knowledge of networking (TCP/IP, UDP/IP, LAN/WAN) Identify and access More ❯
features which deliver AI capabilities to some of the biggest names in the insurance industry. We are developing a modern real-time ML platform using technologies like Python, PyTorch, Ray, k8s (helm + flux), Terraform, Postgres and Flink on AWS. We are very big fans of Infrastructure-as-Code and enjoy Agile practices. As a team, we're driven by More ❯
London, England, United Kingdom Hybrid / WFH Options
Spotify
ICML, ICLR, NeurIPS, AAAI, WWW, KDD, or related A problem-solver with experience with Python, R, or similar languages. Experience with tools like CausalML, EconML, TensorFlow, PyTorch, Scikit-learn, Ray, etc., is a strong plus You have experience with hands-on skills in sourcing, cleaning, manipulating, analysing, visualising and modelling of real data. Experience with SQL is a plus You More ❯
London, England, United Kingdom Hybrid / WFH Options
Artefact
explore techniques like time-series forecasting, clustering, or Bayesian inference. Orchestration and Parallelisation : Manage workflows with tools like Metaflow, MLFlow, AirFlow, or DVC; utilise parallelisation frameworks like PySpark or Ray for efficient model processing. Exposure to cloud platforms (AWS, Azure, GCP) Why you should join us Artefact is revolutionizing marketing: join us to build the future of marketing Progress: every More ❯
London, England, United Kingdom Hybrid / WFH Options
Waymo
training, deploying, and optimizing large-scale machine learning systems from data to model. Solid experience in the development and optimization of machine learning infrastructure tools like DeepSpeed, PyTorch, TensorFlow, Ray, or similar frameworks. Expertise in distributed training techniques, including gradient sharding and optimization strategies for scaling large models across ML accelerator profiling tools to uncover performance bottlenecks. Familiarity with custom More ❯
Bristol, Gloucestershire, United Kingdom Hybrid / WFH Options
Avature
a closely related field. Demonstrated experience in benchmarking foundational models for real-world applications. Experience in machine learning and developing AI models in frameworks such as Pytorch, TensorFlow, FSDP, Ray, and so forth. Expertise in one or more AI areas, including transfer learning, model distillation, surrogate models, and reinforcement learning. Research experience in designing and developing domain-specific and/ More ❯
Bristol, England, United Kingdom Hybrid / WFH Options
Avature
a closely related field. Demonstrated experience in benchmarking foundational models for real-world applications. Experience in machine learning and developing AI models in frameworks such as Pytorch, TensorFlow, FSDP, Ray, and so forth. Expertise in one or more AI areas, including transfer learning, model distillation, surrogate models, and reinforcement learning. Research experience in designing and developing domain-specific and/ More ❯