Nice to haves: Prior experience with PCB design, EDA tools, or related optimization problems. Hands-on experience in high-performance computing environments (e.g., Kubernetes, Ray, Dask). Contributions to open-source projects, publications, or top placements in ML competitions (e.g., Kaggle). Expertise in related fields such as Computer Vision More ❯
City of London, England, United Kingdom Hybrid / WFH Options
uk.tiptopjob.com - Jobboard
justify"- -Programming Skills: Proficiency in Python, data analytics, deep learning (Scikit-learn, Pandas, PyTorch, Jupyter, pipelines), and practical knowledge of data tools like Databricks, Ray, Vector Databases, Kubernetes, and workflow scheduling tools such as Apache Airflow, Dagster, and Astronomer. -GPU Computing: Familiarity with GPU computing, both on-premises and on More ❯
London, England, United Kingdom Hybrid / WFH Options
Merantix
Docker and Kubernetes Documenting code, architectures, and experiments Linux systems and bash terminals Preferred Qualifications Hands-on experience with: Distributed computing frameworks, such as Ray Data and Spark. Databases and/or data warehousing technologies, such as Apache Hive. Data transformation via SQL and DBT. Orchestration platforms, such as Apache More ❯
are: Programming Skills: Proficiency in Python, data analytics, deep learning (Scikit-learn, Pandas, PyTorch, Jupyter, pipelines), and practical knowledge of data tools like Databricks, Ray, Vector Databases, Kubernetes, and workflow scheduling tools such as Apache Airflow, Dagster, and Astronomer. GPU Computing: Familiarity with GPU computing, both on-premises and on More ❯
Mountain View, California, United States Hybrid / WFH Options
LinkedIn
with thousands of latest GPU cards. The team also works closely with the open source community and has many open source committers (TensorFlow, Horovod, Ray, vLLM, Hugginface, DeepSpeed etc.) in the team. Additionally, this team focussed on technologies like LLMs, GNNs, Incremental Learning, Online Learning and Serving performance optimizations across More ❯
Washington, Washington DC, United States Hybrid / WFH Options
RAND Corporation
the AI/ML hardware stack (e.g. GPUs, TPUs, data center design) Familiarity with the AI/ML software stack (e.g. CUDA, PyTorch, TensorFlow, Ray) Experience working on AI research, ML model training, or model deployment Experience with securing AI systems Education Requirements RAND is hiring a Research Lead at More ❯
Washington, Washington DC, United States Hybrid / WFH Options
RAND Corporation
the AI/ML hardware stack (e.g., GPUs, TPUs, data center design) Familiarity with the AI/ML software stack (e.g., CUDA, PyTorch, TensorFlow, Ray) Experience working on AI research, ML model training, or model deployment Experience with securing AI systems Education Requirements RAND is hiring multiple Visiting AI Security More ❯
London, England, United Kingdom Hybrid / WFH Options
InstaDeep Ltd
CUDA code to achieve performance breakthroughs. Required Skills Understanding of Linux systems, performance analysis tools, and hardware optimisation techniques Experience with distributed training frameworks (Ray, Dask, PyTorch Lightning, etc.) Expertise with Python and/or C/C++ Development with machine learning frameworks (JAX, Tensorflow, PyTorch etc.) Passion for profiling More ❯
or business challenges into well-defined machine learning solutions We are using many technologies day to day such as various AWS services, GCP, Kubernetes, Ray Serve, Kubeflow, and ReTool. Any experience in these areas would be a bonus Sprout.ai Values Hungry for Growth - Unleash your inner Sprout: Sprouts embrace growth More ❯
London, England, United Kingdom Hybrid / WFH Options
Sprout.ai
or business challenges into well-defined machine learning solutions We are using many technologies day to day such as various AWS services, GCP, Kubernetes, Ray Serve, Kubeflow, and ReTool. Any experience in these areas would be a bonus Sprout.ai Values Hungry for Growth - Unleash your inner Sprout: Sprouts embrace growth More ❯
experiments, and propose system improvements. Excellent communication skills in English, both written and verbal. Preferred Skills Have experiences on working with distributed HPC clusters: Ray, Kubernetes, Docker. Experience with fine-tuning LLM/Embed for downstream tasks. Experience with building, evaluating RAG systems. Selected Technical Skills: Artificial Intelligence, Python English More ❯
monitoring systems Experience with ML training and data pipelines and working with distributed systems Proficiency with modern deep learning libraries and frameworks (PyTorch, Lightning, Ray) Preferred Qualifications Experience owning a product from development through monitoring and incident response Knowledge of the design, manufacturing, AEC, or media & entertainment industries Experience with More ❯
the pipeline. Collaborate with research to define data quality benchmarks . Optimize end-to-end performance across distributed data processing frameworks (e.g., Apache Spark, Ray, Airflow). Work with infrastructure teams to scale pipelines across thousands of GPUs . Work directly with the leadership on the data team roadmaps. Manage More ❯
to some of the biggest names in the insurance industry. We are developing a modern real-time ML platform using technologies like Python, PyTorch, Ray, k8s (helm + flux), Terraform, Postgres and Flink on AWS. We are very big fans of Infrastructure-as-Code and enjoy Agile practices. As a More ❯
London, England, United Kingdom Hybrid / WFH Options
Spotify
KDD, or related A problem-solver with experience with Python, R, or similar languages. Experience with tools like CausalML, EconML, TensorFlow, PyTorch, Scikit-learn, Ray, etc., is a strong plus You have experience with hands-on skills in sourcing, cleaning, manipulating, analysing, visualising and modelling of real data. Experience with More ❯
London, England, United Kingdom Hybrid / WFH Options
Artefact
forecasting, clustering, or Bayesian inference. Orchestration and Parallelisation : Manage workflows with tools like Metaflow, MLFlow, AirFlow, or DVC; utilise parallelisation frameworks like PySpark or Ray for efficient model processing. Exposure to cloud platforms (AWS, Azure, GCP) Why you should join us Artefact is revolutionizing marketing: join us to build the More ❯
Bristol, Gloucestershire, United Kingdom Hybrid / WFH Options
Avature
experience in benchmarking foundational models for real-world applications. Experience in machine learning and developing AI models in frameworks such as Pytorch, TensorFlow, FSDP, Ray, and so forth. Expertise in one or more AI areas, including transfer learning, model distillation, surrogate models, and reinforcement learning. Research experience in designing and More ❯
Bristol, England, United Kingdom Hybrid / WFH Options
Avature
experience in benchmarking foundational models for real-world applications. Experience in machine learning and developing AI models in frameworks such as Pytorch, TensorFlow, FSDP, Ray, and so forth. Expertise in one or more AI areas, including transfer learning, model distillation, surrogate models, and reinforcement learning. Research experience in designing and More ❯
Palo Alto, California, United States Hybrid / WFH Options
IntelliPro
Big Data Infrastructure: 5+ years of engineering experience, including 2+ years in petabyte-scale data processing (mandatory) - Cluster Orchestration: Deep knowledge of Kubernetes, SLURM, Ray, or similar cluster management systems - Plus: Experience with visual data processing or ML engineering background More ❯
Bristol, England, United Kingdom Hybrid / WFH Options
IBM
Mathematics, or a related field. Experience in benchmarking foundational models for real-world applications. Proficiency in machine learning frameworks such as PyTorch, TensorFlow, FSDP, Ray, etc. Expertise in AI areas like transfer learning, model distillation, surrogate models, and reinforcement learning. Research experience in designing and developing domain-specific or generalizable More ❯
London, England, United Kingdom Hybrid / WFH Options
Waymo
scale machine learning systems from data to model. Solid experience in the development and optimization of machine learning infrastructure tools like DeepSpeed, PyTorch, TensorFlow, Ray, or similar frameworks. Expertise in distributed training techniques, including gradient sharding and optimization strategies for scaling large models across ML accelerator profiling tools to uncover More ❯
learning research and production. High-Performance Data Pipelines Develop and optimize distributed systems for data processing, including filtering, indexing, and retrieval, leveraging frameworks like Ray, Metaflow, Spark, or Hadoop. Synthetic Data Generation Build and orchestrate pipelines to generate synthetic data at scale, advancing research on cost-efficient inference and training … rendering engines, and/or other softwares. Distributed Computing & MLOps Demonstrated proficiency in setting up large-scale, robust data pipelines, using frameworks like Spark, Ray, or Metaflow. Comfortable with model versioning, and experiment tracking. Performance Optimization Good understanding of parallel and distributed computing. Experienced with setting up evaluation methods Cloud More ❯
London, England, United Kingdom Hybrid / WFH Options
Autodesk
architectures and networking Experience with Documenting code, architectures, and experiments Relational databases (e.g., MySQL, PostgreSQL) and NoSQL databases (e.g., MongoDB, Cassandra) Frameworks such as Ray data, Metaflow, Hadoop, Spark, or Hive Preferred Qualifications Experience with computational geometry such as mesh or boundary representation data processing. Experience with CAD model search … services & architectures (AWS, Azure, etc.) Documenting code, architectures, and experiments Relational databases (e.g., MySQL, PostgreSQL) and NoSQL databases (e.g., MongoDB, Cassandra) Frameworks such as Ray data, Metaflow, Hadoop, Spark, or Hive Vector data stores Preferred Qualifications Experience with computational geometry such as mesh or boundary representation data processing. Experience with More ❯
Architect and optimise large-scale compute-intensive workloads spanning significant numbers of nodes and concurrent tasks Design, build, and manage systems with tools like Ray and YellowDog Optimise application performance on distributed platforms Provide architectural guidance on distributed computing design and development Drive efficiency and scalability across the platform, with … i.e. Yellowdog Cloud platform proficiency (any provider) Experience with large scale systems (1k+ Nodes, 10k+ tasks) Experience monitoring/troubleshooting a distributed environment Advance Ray experience for ML pipelines, tuning, distributed execution Python and Conda proficiency Docker + Kubernetes experience Knowledge of networking (TCP/IP, UDP/IP, LAN More ❯