paced, collaborative, and dynamic environment. Nice to haves: Prior experience with PCB design, EDA tools, or related optimization problems. Hands-on experience in high-performance computing environments (e.g., Kubernetes, Ray, Dask). Contributions to open-source projects, publications, or top placements in ML competitions (e.g., Kaggle). Expertise in related fields such as Computer Vision, Representation Learning, or Simulation Environments. More ❯
Pallas, Triton, and/or CUDA code to achieve performance breakthroughs. Required Skills Understanding of Linux systems, performance analysis tools, and hardware optimisation techniques Experience with distributed training frameworks (Ray, Dask, PyTorch Lightning, etc.) Expertise with Python and/or C/C++ Development with machine learning frameworks (JAX, Tensorflow, PyTorch etc.) Passion for profiling, identifying bottlenecks, and delivering efficient More ❯
serving experience with modern inference servers and API gateways for AI applications Nice to have: Infrastructure as Code experience with Terraform, Ansible, or CloudFormation Distributed computing experience with Databricks, Ray, or Spark for large-scale AI workloads AI safety & governance experience with model evaluation, bias detection, and responsible AI practices Multi-modal AI experience with vision-language models, speech processing More ❯
teams Experience with large scale, distributed data processing frameworks/tools like Apache Beam, Apache Spark, and cloud platforms like GCP or AWS Experience with technologies such as Kubernetes, Ray is a plus Experience troubleshooting model training and deployment across hardware ecosystems that mix CPU and GPU is a plus Experience optimizing model training and inference for different GPU types More ❯
features which deliver AI capabilities to some of the biggest names in the insurance industry. We are developing a modern real-time ML platform using technologies like Python, PyTorch, Ray, k8s (helm + flux), Terraform, Postgres and Flink on AWS. We are very big fans of Infrastructure-as-Code and enjoy Agile practices. As a team, we're driven by More ❯
labeled and unlabeled data Qualifications PhD in CS/CE/EE, or equivalent, in industry experience Deep knowledge of PyTorch Knowledge of model training framework (e.g. PyTorch Lightning, ray, etc.) In-depth knowledge of transformer architecture and ways to accelerate the training and inference of transformer models Experience of performing large scale distributed training of models A track record More ❯
required. Below is a detailed breakdown of all the technologies we use. - Backend: Python - Frontend: Typescript and React - Kubernetes for deployment - GCP for underlying infrastructure - Machine Learning: PyTorch, CUDA, Ray We encourage people from all backgrounds, cultures and skill levels to apply. It is okay to not meet all requirements listed as we are looking for individuals who are passionate More ❯
features which deliver AI capabilities to some of the biggest names in the insurance industry. We are developing a modern real-time ML platform using technologies like FastAPI, PyTorch, Ray, k8s (helm + flux), Terraform, Postgres, Flink on AWS, React & Typescript. We operate a fully Python stack except for frontend and infrastructure code. We are very big fans of Infrastructure More ❯
strong software engineering skills. Proficiency in Python and related ML frameworks such as JAX, Pytorch and XLA/MLIR. Experience with distributed training infrastructures (Kubernetes, Slurm) and associated frameworks (Ray). Experience using large-scale distributed training strategies. Hands on experience on training large model at scale. Hands on experience with the post training phase of model training, with a More ❯
Our Mission Our mission is to restore cell health and resilience through cell rejuvenation to reverse disease, injury, and the disabilities that can occur throughout life. For more information, see our website at Our Value Our Single Altos Value: Everyone More ❯
About Anyscale At Anyscale , we're on a mission to democratize distributed computing and make it accessible to software developers of all skill levels. We're commercializing Ray , a popular open-source project that's creating an ecosystem of libraries for scalable machine learning. Companies like OpenAI , Uber , Spotify , Instacart , Cruise , and many more, have Ray in their tech stacks … to accelerate the progress of AI applications out into the real world. With Anyscale, we're building the best place to run Ray, so that any developer or data scientist can scale an ML application from their laptop to the cluster without needing to be a distributed systems expert. Proud to be backed by Andreessen Horowitz, NEA, and Addition with … and Ray. You'll be on point for demoing our product, scoping POVs, making users successful, and amplifying the voice of our customers. Expect to learn a ton about Ray, Anyscale, and early stage product go-to-market! You'll be fundamental in helping us disrupt what it means to build distributed applications at scale. Our product is inherently technical More ❯
QA team. Developing or refining your expertise in the domain area of the product component or the system in aggregate and at scale. Specific domains include Workload Management (Kubernetes, Ray, and so on); Cloud Development (Cloud Infrastructure Automation); Management & Observability (open source and commercial monitoring, observability and DCIM solutions) Skills and Experience Essential Strong relevant programming experience Python/Go … constructing, and executing responsibilities & duties above. English- C1 level. Desirable Domain experience of the products under test: Containerisation (e.g. Docker), Virtualisation and Provisioning, Workload and job scheduling (e.g. Kubernetes, Ray) on high core-count machines and rack-scale installations, Management and Observability (e.g. Prometheus, OpenTelemetry, DataDog, Splunk, etc.). 10+ years of relevant experience related to quality assurance/testing More ❯
QA team. Developing or refining your expertise in the domain area of the product component or the system in aggregate and at scale. Specific domains include Workload Management (Kubernetes, Ray, and so on); Cloud Development (Cloud Infrastructure Automation); Management & Observability (open source and commercial monitoring, observability and DCIM solutions) Skills and Experience Essential Strong relevant programming experience Python/Go … constructing, and executing responsibilities & duties above. English- C1 level. Desirable Domain experience of the products under test: Containerisation (e.g. Docker), Virtualisation and Provisioning, Workload and job scheduling (e.g. Kubernetes, Ray) on high core-count machines and rack-scale installations, Management and Observability (e.g. Prometheus, OpenTelemetry, DataDog, Splunk, etc.). 10+ years of relevant experience related to quality assurance/testing More ❯