disciplines may be beneficial: GUI development using .Net Technologies (C#, WinForms or WPF) or Qt/QML framework, or HTML5 GPU development for the solution of algorithmic problems (OpenCL, CUDA) Machine learning and AI 2D/3D graphics development; OpenGL, OpenGL Shaders, VTK, OSG, Vulkan More ❯
language, vision and other modalities, machine learning for molecules and proteins (ideally with some background in chemistry and biological sciences) . Lower-level programming for hardware efficiency, e.g. C CUDA/Triton. Practical familiarity with hardware capabilities for deep learning - threads, caches, vector & matrix engines, data dependencies, bus widths and throttling. Practical familiarity with software stacks for deep learning More ❯
large language models, efficient computing based on low-precision arithmetic, deep learning models including large generative models for language, vision and other modalities . Experience writing C Triton/CUDA kernels for performance optimisation of ML models. Have contributed to open-source projects or published research papers in relevant fields. Knowledge of cloud computing platforms. Keen to present, publish More ❯
or create insights, that's a plus. Deeper systems knowledge. Extraexperience with any of the following would be an asset: developing GPU kernels and/or ML compilers (e.g. CUDA, OpenCL, TensorRT Plugins, MLIR, TVM, etc); optimizing systems to meet strict utilization and latency requirements with tools such as Nvidia NSight; and/or you've worked with embedded More ❯
or create insights, that's a plus. Deeper systems knowledge. Extraexperience with any of the following would be an asset: developing GPU kernels and/or ML compilers (e.g. CUDA, OpenCL, TensorRT Plugins, MLIR, TVM, etc); optimizing systems to meet strict utilization and latency requirements with tools such as Nvidia NSight; and/or you've worked with embedded More ❯
Desirable skills: Experience in solving non-linear least square problems Experience with Computer Vision Experience in UI development e.g. ImGui Understanding of multithreading techniques Experience with GPU programming e.g. CUDA Experience with a messaging framework, e.g. NATS, RabbitMQ Experience working in and configuring cloud environments (e.g. AWS, Azure, GCP) Experience working with software containers (Docker, Podman) and container orchestration More ❯
London, South East, England, United Kingdom Hybrid / WFH Options
Octad Recruitment Ltd
Desirable skills: Experience in solving non-linear least square problems Experience with Computer Vision Experience in UI development e.g. ImGui Understanding of multithreading techniques Experience with GPU programming e.g. CUDA Experience with a messaging framework, e.g. NATS, RabbitMQ Experience working in and configuring cloud environments (e.g. AWS, Azure, GCP) Experience working with software containers (Docker, Podman) and container orchestration More ❯
as PyTorch, TensorFlow, ONNX Knowledge of LLM architectures and inference optimization techniques (e.g., batching, quantization) Experience deploying scalable, reliable, real-time model serving systems (Optional) GPU architecture understanding or CUDA programming experience The compensation range for this role is $190,000 - $240,000. At Perplexity, we have experienced significant growth since launching the world's first conversational answer engine More ❯
Inference experience for high-throughput model serving - Proven ability to work on air-gapped systems with no external package repositories - Experience with GPU orchestration (NVIDIA A100/H100) and CUDA optimisation - Python expertise with offline dependency management and local package mirrors Technical Stack (All On-Premises) Models: Llama 3, Mistral, Qwen (locally hosted) Vector Stores: Chroma, FAISS, Milvus Orchestration More ❯
Inference experience for high-throughput model serving - Proven ability to work on air-gapped systems with no external package repositories - Experience with GPU orchestration (NVIDIA A100/H100) and CUDA optimisation - Python expertise with offline dependency management and local package mirrors Technical Stack (All On-Premises) Models: Llama 3, Mistral, Qwen (locally hosted) Vector Stores: Chroma, FAISS, Milvus Orchestration More ❯
novel solutions. About you: C++ is your strongest language Ideally experience in Video or Audio Processing Experience writing performance-critical software Exposure to GPU technology (Vulkan API, OpenGL, OpenCL, CUDA etc.,) Relevant degree Full details are available. Please don't hesitate to get in touch with (email address removed). com to learn more. More ❯
novel solutions. About you: C++ is your strongest language Ideally experience in Video or Audio Processing Experience writing performance-critical software Exposure to GPU technology (Vulkan API, OpenGL, OpenCL, CUDA etc.,) Relevant degree Full details are available. Please don't hesitate to get in touch with (email address removed). com to learn more. More ❯
Research Computing & AI The order of skillset/desirability for a candidate for this role is as follows: Linux System Administration (any flavour) Cluster computing/Slurm GPU/CUDA Cloud computing A rare opportunity has emerged to become a founding member of a newly established AI and high-performance computing (HPC) division at one of the world’s More ❯
proficiency in Nvidia GPU performance optimization techniques, including memory management, kernel fusion, and quantization strategies for large-scale deep learning workloads; - Strong foundation in parallel computing principles with practical CUDA programming experience, emphasizing efficient resource utilization and throughput maximization; - Demonstrated success implementing and tuning distributed AI systems leveraging modern frameworks like Megatron-LM and Ray, with particular focus on … Learning, Operations Research, Statistics, Mathematics, etc.); - Proficiency in performance optimization on Amazon Trainiums; - Proficiency in kernel programming for accelerated hardware using programming models such as (but not limited to) CUDA - Solid end-to-end hands-on development experience of deep learning algorithms related to Transformers; - Experience with patents or publications at top-tier peer-reviewed conferences or journals. - Past More ❯
you will: Design and write high-performant and scalable software for training. Understand architectural modifications and design choices and their effects on training throughput and quality. Write low-level CUDA, triton kernels to squeeze every last bit of performance from our accelerators. Research, implement, and experiment with ideas on our supercompute and data infrastructure. Learn from and work with … if you have: Extremely strong software engineering skills. Proficiency in Python and related ML frameworks such as JAX, Pytorch and XLA/MLIR. Experience writing kernels for GPUs using CUDA, triton, etc Experience using large-scale distributed training strategies. Familiarity with autoregressive sequence models, such as Transformers. Bonus : paper at top-tier venues (such as NeurIPS, ICML, ICLR, AIStats More ❯