approach to AI implementation. Effective communication and collaboration skills in cross-functional teams . Preferred Skills High-Performance Computing (HPC) and AI workloads for large-scale enterprise solutions. NVIDIACUDA, cuDNN, TensorRT experience for deep learning acceleration. Big Data platforms (Hadoop, Spark) for AI-driven analytics in professional services. Pls share CV at payal.c@hcltech.com More ❯
e.g., Unreal Engine, Unity, custom 3D engines). Proven track record of publications at top-tier conferences (e.g., NeurIPS, CVPR, ICML, ICLR, SIGGRAPH, ECCV). Experience with GPU programming (CUDA) and model optimization for real-time inference (e.g., quantization, pruning, ONNX, TensorRT, custom CUDA kernels). Background in scalable algorithm design for real-time or interactive applications. Experience More ❯
e.g., Unreal Engine, Unity, custom 3D engines). Proven track record of publications at top-tier conferences (e.g., NeurIPS, CVPR, ICML, ICLR, SIGGRAPH, ECCV). Experience with GPU programming (CUDA) and model optimization for real-time inference (e.g., quantization, pruning, ONNX, TensorRT, custom CUDA kernels). Background in scalable algorithm design for real-time or interactive applications. Experience More ❯
PyTorch internals and other major ML frameworks. Experience optimizing deep learning performance on accelerator hardware. Solid knowledge of deep learning algorithms and compute patterns. Strong programming skills in C++, CUDA, or OpenCL. Background in performance profiling and optimization. BS/MS in Computer Science, Electrical Engineering, or a related field. Interested? Send your CV to to apply. More ❯
numerical calculation, compilation, algorithm and chip co-design, runtime, or shared memory Strong background in software development using C/C++ and Python Skilled with GPU compute APIs (e.g., CUDA, OpenCL), deep learning frameworks, and compilers Familiarity with AI models, algorithm trends, and translating application requirements into chip-level solutions Experience with GPU acceleration, inference backends, and frameworks such More ❯
models, building production systems with large language models, efficient computing with low-precision arithmetic, or large generative models for language, vision, and other modalities. Experience writing C++, Triton, or CUDA kernels for performance optimisation of ML models. Contributions to open-source projects or published research papers in relevant fields. Knowledge of cloud computing platforms. Keen to present, publish, and More ❯
Senior CFD Software Engineer London I'm currently supporting a Fortune 100 organisation and a global leader. In this role, you will design and enhance solver features for both CPU and GPU, as well as develop pre- and post-processing More ❯
Senior CFD Software Engineer London I'm currently supporting a Fortune 100 organisation and a global leader. In this role, you will design and enhance solver features for both CPU and GPU, as well as develop pre- and post-processing More ❯
Essential Skills Masters or higher degree in ML/AI, Computer Science/Engineering, or related disciplines Professional software development experience with modern C++ Experience with GPU compute in CUDA/OpenCL Excellent communication, teamwork and a results-oriented attitude Proficiency in problem-solving and debugging Expertise in image-based 3D reconstruction: Photogrammetry, Neural Radiance Fields (NERF) or Gaussian More ❯
City of London, London, United Kingdom Hybrid / WFH Options
Annapurna
across diverse vendor platforms. Working with low-level system and memory management techniques to minimize latency and improve real-time inference performance. Utilizing and implementing GPU programming APIs (e.g., CUDA, OpenCL) to ensure high efficiency and compatibility across GPUs. Profiling and debugging system performance using tools like NVIDIA Nsight, Intel VTune, and vendor-specific profilers, identifying bottlenecks and implementing … autonomous systems. Essential Requirements: 3+ years of experience in C++ programming, with a strong grasp of modern C++ standards. Proven experience in GPU programming and optimization, with proficiency in CUDA, OpenCL, or other GPU programming frameworks. Strong knowledge of parallel computing concepts, including data locality, memory access patterns, and synchronization. Proficiency with performance profiling tools and techniques for identifying More ❯
across diverse vendor platforms. Working with low-level system and memory management techniques to minimize latency and improve real-time inference performance. Utilizing and implementing GPU programming APIs (e.g., CUDA, OpenCL) to ensure high efficiency and compatibility across GPUs. Profiling and debugging system performance using tools like NVIDIA Nsight, Intel VTune, and vendor-specific profilers, identifying bottlenecks and implementing … autonomous systems. Essential Requirements: 3+ years of experience in C++ programming, with a strong grasp of modern C++ standards. Proven experience in GPU programming and optimization, with proficiency in CUDA, OpenCL, or other GPU programming frameworks. Strong knowledge of parallel computing concepts, including data locality, memory access patterns, and synchronization. Proficiency with performance profiling tools and techniques for identifying More ❯
City of London, London, United Kingdom Hybrid / WFH Options
European Tech Recruit
Industrial experience in deploying SLAM solutions. Proficiency in C++. Desirable experience: PhD in computer vision or robotics. Experience with machine learning techniques for geometric & semantic estimation. GPU programming skills (CUDA, OpenCL, Vulkan, Metal). Experience with embedded software development. If this role is of any interest please apply directly on LinkedIn or send a copy of your CV to More ❯
Industrial experience in deploying SLAM solutions. Proficiency in C++. Desirable experience: PhD in computer vision or robotics. Experience with machine learning techniques for geometric & semantic estimation. GPU programming skills (CUDA, OpenCL, Vulkan, Metal). Experience with embedded software development. If this role is of any interest please apply directly on LinkedIn or send a copy of your CV to More ❯
South East London, England, United Kingdom Hybrid / WFH Options
European Tech Recruit
Industrial experience in deploying SLAM solutions. Proficiency in C++. Desirable experience: PhD in computer vision or robotics. Experience with machine learning techniques for geometric & semantic estimation. GPU programming skills (CUDA, OpenCL, Vulkan, Metal). Experience with embedded software development. If this role is of any interest please apply directly on LinkedIn or send a copy of your CV to More ❯
products with the latest machine learning advancements. Requirements include strong programming skills in Python, C, C++, experience with deployment platforms, and familiarity with NLP, computer vision, TensorFlow, PyTorch, JAX, CUDA, LLMs, and related technologies. A degree in a relevant field and a solid AI R&D track record are essential. More ❯
reason through quantitative problems and communicate effectively with trading researchers Reliable and predictable availability Bonus Points Experience with HPC and distributed large model training Experience with GPU performance optimization (CUDA or ROCm) Experience with end-to-end model development, especially in LLMs Prior academic publications and/or contributions to open-source AI research Strong opinions on best practices More ❯
large language models, efficient computing based on low-precision arithmetic, deep learning models including large generative models for language, vision and other modalities . Experience writing C Triton/CUDA kernels for performance optimisation of ML models. Have contributed to open-source projects or published research papers in relevant fields. Knowledge of cloud computing platforms. Keen to present, publish More ❯
large language models, efficient computing based on low-precision arithmetic, deep learning models including large generative models for language, vision and other modalities . Experience writing C Triton/CUDA kernels for performance optimisation of ML models. Have contributed to open-source projects or published research papers in relevant fields. Knowledge of cloud computing platforms. Keen to present, publish More ❯
engineering principles to ensure robust, maintainable solutions. PREFERRED EXPERIENCE: GPU Kernel Development & Optimization: Proficient experienced in designing and optimizing GPU kernels for deep learning on AMD GPUs using HIP, CUDA, and assembly (ASM). Strong knowledge of AMD architectures (GCN, RDNA) and low-level programming to maximize performance for AI operations, leveraging tools like Compute Kernel (CK), CUTLASS, and More ❯
collaboratively, thrive in ambiguity, and take full ownership of what you build. Key technical skills Strong back-end development experience (Python, Node.js ) Working knowledge of C++ and GPU computing (CUDA, OpenCL) Proven ability to design, build, and maintain robust APIs Proficiency with cloud platforms (e.g. AWS, GCP, or Azure), containerisation, and CI/CD pipelines Familiarity with scalable data More ❯
City of London, London, United Kingdom Hybrid / WFH Options
Pinepeak
collaboratively, thrive in ambiguity, and take full ownership of what you build. Key technical skills Strong back-end development experience (Python, Node.js ) Working knowledge of C++ and GPU computing (CUDA, OpenCL) Proven ability to design, build, and maintain robust APIs Proficiency with cloud platforms (e.g. AWS, GCP, or Azure), containerisation, and CI/CD pipelines Familiarity with scalable data More ❯
large language models, efficient computing based on low-precision arithmetic, deep learning models including large generative models for language, vision and other modalities . Experience writing C Triton/CUDA kernels for performance optimisation of ML models. Have contributed to open-source projects or published research papers in relevant fields. Knowledge of cloud computing platforms. Keen to present, publish More ❯
on ML infrastructure - 8+ years of current programming experience building ML infrastructure using languages such as Python, C++ or Rust - Hands-on experience with parallel computing platforms such as CUDA, OpenMP, etc - Deep understanding of AI frameworks such as PyTorch, TensorFlow, and JAX, and their demands on underlying compute infrastructure, memory bandwidth, network interconnect, and storage as scale goes More ❯
analysis to prototype quickly Desirable Experience Experience with TensorRT , Nvidia Deepstream , or other deployment frameworks Background in neural network design or edge inference Programming in C/C++ and CUDA Realtime or embedded vision applications Why Join AssetCool? Tackle some of the toughest challenges in robotics, vision, and infrastructure tech Join a growing team with global ambitions and a More ❯
high-impact initiatives and push the boundaries of model performance. You'll also work on re-implementing models in an efficient manner by using PyTorch and underlying technologies like Cuda Kernels, Torch compilation techniques. This would include: Evaluating and optimising compute resource usage (e.g., Hopper GPUs) for cost and time efficiency at training and inference times. Driving the adoption More ❯