Contribute to hiring additional talent to our rapidly growing team The role will be exposed to a broad tech stack (e.g. ReactJS, Python, REST & GraphQL, OpenCV, PyTorch, GCP, AWS & CUDA, Kubernetes) and the cutting edge of computer vision and deep learning. Qualifications The right candidate will have a proven track record of relevant publications and previous experience managing applied More ❯
image or video captioning, speech-to-text generation. Bonus: Publications in top-tier venues demonstrating your expertise in multimodal AI research. Bonus: Experience in writing efficient GPU kernels using CUDA, optimising performance for multimodal tasks. This role is perfect for you if you: Have a deep passion for machine learning and its potential to impact various industries through multimodal More ❯
with the latest developments in model optimization, inference engines, quantization methods, and related technologies. Requirements Proven professional experience optimizing neural network inference workloads. Strong expertise with TensorRT, Triton language, CUDA programming. Experience with neural network quantization techniques. Proficiency in Python and PyTorch. Deep understanding of GPU architectures and performance optimization. Excellent problem-solving skills and ability to analyze performance More ❯
training and inference. We care about efficient large-scale training, low-latency inference in real-time systems and high-throughput inference in research. Part of this is improving straightforward CUDA, but the interesting part needs a whole-systems approach, including storage systems, networking and host- and GPU-level considerations. Zooming in, we also want to ensure our platform makes … run’s performance end to end Low-level GPU knowledge of PTX, SASS, warps, cooperative groups, Tensor Cores and the memory hierarchy Debugging and optimisation experience using tools like CUDA GDB, NSight Systems, NSight Computesight-systems and nsight-compute Library knowledge of Triton, CUTLASS, CUB, Thrust, cuDNN and cuBLAS Intuition about the latency and throughput characteristics of CUDAMore ❯