or create insights, that's a plus. Deeper systems knowledge. Extraexperience with any of the following would be an asset: developing GPU kernels and/or ML compilers (e.g. CUDA, OpenCL, TensorRT Plugins, MLIR, TVM, etc); optimizing systems to meet strict utilization and latency requirements with tools such as Nvidia NSight; and/or you've worked with embedded More ❯
scientists , and how to optimize the experience. Core Technical skills: System administration experience with OS's like RHEL/CentOS, Ubuntu, tuning Linux kernel Proficiency with Ansible, Nvidia and CUDA toolkits, Kubernetes, and container orchestration Understanding of automation, monitoring, and security with GPU as a service Preferred experience Experience supporting HPE PCAI or other AI/HPC infrastructure and More ❯
modular code delivery in Docker -based environments. Desirable Experience Experience with PyTorch for AI-based perception/control. Familiarity with MoveIt for motion planning in ROS2 . Knowledge of CUDA for C++ real-time optimisation. To Apply: Please email your CV More ❯
Python, with the ability to implement and customize code from research repositories. Solid understanding of machine learning concepts (formal ML background preferred but not required). Experience with Linux, CUDA, and PyTorch in a research or production setting. Demonstrated ability to create customer-specific generative solutions and workflows. Experience managing fine-tuning processes to improve model quality and diversity. More ❯
optimise state-of-the-art algorithms and architectures, ensuring compute efficiency and performance. Low-Level Mastery: Write high-quality Python, C/C++, XLA, Pallas, Triton, and/or CUDA code to achieve performance breakthroughs. Required Skills Understanding of Linux systems, performance analysis tools, and hardware optimisation techniques Experience with distributed training frameworks (Ray, Dask, PyTorch Lightning, etc.) Expertise … with machine learning frameworks (JAX, Tensorflow, PyTorch etc.) Passion for profiling, identifying bottlenecks, and delivering efficient solutions. Highly Desirable Track record of successfully scaling ML models. Experience writing custom CUDA kernels or XLA operations. Understanding of GPU/TPU architectures and their implications for efficient ML systems. Fundamentals of modern Deep Learning Actively following ML trends and a desire … to push boundaries. Example Projects: Profile algorithm traces, identifying opportunities for custom XLA operations and CUDA kernel development. Implement and apply SOTA architectures (MAMBA, Griffin, Hyena) to research and applied projects. Adapt algorithms for large-scale distributed architectures across HPC clusters. Employ memory-efficient techniques within models for increased parameter counts and longer context lengths. What We Offer: Real More ❯