Experience: Proficiency in C++ with a strong focus on memory management, multi-threading, and low-level performance optimizations. Experience with GPU architectures (e.g., NVIDIA, AMD) and programming frameworks like CUDA, OpenCL, and TensorFlow. Understanding of machine learning algorithms, including model training and inference, and how to optimize these for GPU-based computation. Strong knowledge of parallel computing, vectorization, and More ❯
Experience: Proficiency in C++ with a strong focus on memory management, multi-threading, and low-level performance optimizations. Experience with GPU architectures (e.g., NVIDIA, AMD) and programming frameworks like CUDA, OpenCL, and TensorFlow. Understanding of machine learning algorithms, including model training and inference, and how to optimize these for GPU-based computation. Strong knowledge of parallel computing, vectorization, and More ❯
approach to AI implementation. Effective communication and collaboration skills in cross-functional teams . Preferred Skills High-Performance Computing (HPC) and AI workloads for large-scale enterprise solutions. NVIDIACUDA, cuDNN, TensorRT experience for deep learning acceleration. Big Data platforms (Hadoop, Spark) for AI-driven analytics in professional services. #J-18808-Ljbffr More ❯
software on GitHub, PyPI, Anaconda Cloud, and Docker Hub, as well as use of Pytorch lightning, Git, test-driven design. Knowledge of parallel computing technologies, such as NVIDIA's CUDA platform, OpenCL, and OpenMPI. The salary range for Cambridge, UK: - Senior Scientist I, Computational Biology: £75,000 - £117,500 Senior Scientist II, Computational Biology: £94,000 - £152,500 Exact More ❯
Skills: Proficiency in C/C++ and Python Technical Expertise: Experience with multi-tasking systems (real-time preferable) and familiarity with signal processing or AI/ML applications using CUDA on GPUs (preferred), medical device communications protocols (HL7, FHIR) Development Approach: Knowledge of agile methodologies and best practices in software development Tools & Practices: Proficiency with version control systems (e.g. More ❯
at a leading technology company. Strong expertise in algorithms, data structures, multivariate calculus, and linear algebra. Proficient in Python, TensorFlow, PyTorch, or similar languages and frameworks, with experience writing CUDA kernels and profiling GPU code a plus. Excellent communication skills, with the ability to work effectively in cross-functional teams and present complex ideas to both technical and non More ❯
at a leading technology company. Strong expertise in algorithms, data structures, multivariate calculus, and linear algebra. Proficient in Python, TensorFlow, PyTorch, or similar languages and frameworks, with experience writing CUDA kernels and profiling GPU code a plus. Excellent communication skills, with the ability to work effectively in cross-functional teams and present complex ideas to both technical and non More ❯
development Experience with software development processes and tools such as Git source code control, profiler, and debugger Effective communication and problem-solving skills Experience with compute languages like HIP, CUDA, OpenCL is a plus ACADEMIC CREDENTIALS: Bachelor’s or Master’s degree in Computer Science, Computer Engineering, Electrical Engineering, or equivalent #LI-RA1 #LI-Remote Benefits offered are described More ❯
open-source libraries) Linux Kernel drivers Software Defined Radio (SDR) API gateways Machine Learning/Artificial Intelligence (Client/AI) training pipelines C, C Rust Assembler (ARM, X86) Java CUDA Developing compilers OAuth/or SAML TLS/SSL Marathon TS is committed to the development of a creative, diverse and inclusive work environment. In order to provide equal More ❯
of the mathematical foundations of deep learning, including multivariate calculus, linear algebra, and optimization techniques. Proficient in Python and deep learning frameworks such as TensorFlow and PyTorch. Experience with CUDA kernels and GPU profiling is a plus. Excellent communication skills, with the ability to present complex technical ideas to both technical and non-technical audiences. Knowledge of quantitative finance More ❯
of the mathematical foundations of deep learning, including multivariate calculus, linear algebra, and optimization techniques. Proficient in Python and deep learning frameworks such as TensorFlow and PyTorch. Experience with CUDA kernels and GPU profiling is a plus. Excellent communication skills, with the ability to present complex technical ideas to both technical and non-technical audiences. Knowledge of quantitative finance More ❯
of the mathematical foundations of deep learning, including multivariate calculus, linear algebra, and optimization techniques. Proficient in Python and deep learning frameworks such as TensorFlow and PyTorch. Experience with CUDA kernels and GPU profiling is a plus. Excellent communication skills, with the ability to present complex technical ideas to both technical and non-technical audiences. Knowledge of quantitative finance More ❯
. A proactive ownership mindset and the ability to navigate ambiguity. Excellent collaboration and communication skills for working effectively with teams and stakeholders. Ideally Professional experience GPGPU programming (e.g., CUDA, Triton) for performance optimization. Experience building and maintaining widely-used internal or open-source libraries. Familiarity with the machine learning development lifecycle and core concepts (e.g., bias-variance tradeoff More ❯
C++ code libraries (Linux, Windows, Android) using CMake. Build and integrate robotics applications using ROS, OpenCV, Boost, and Jsoncpp. Implement and optimise object detection models (e.g. YOLOv5) with NvidiaCUDA acceleration. Develop and deploy cloud-based applications using Azure DevOps, Docker, and CI/CD pipelines. Write unit tests with Google Test Framework and manage automated testing in Azure … cameras, 3D point cloud data, and sensor fusion techniques (e.g. Kalman filters). Experience with Nvidia Jetson, Raspberry Pi, and embedded systems. Machine learning for object detection (YOLOv5) and CUDA optimisation. Strong communication and stakeholder engagement skills. Azure DevOps and YAML pipeline scripting. More ❯
C++ code libraries (Linux, Windows, Android) using CMake. Build and integrate robotics applications using ROS, OpenCV, Boost, and Jsoncpp. Implement and optimise object detection models (e.g. YOLOv5) with NvidiaCUDA acceleration. Develop and deploy cloud-based applications using Azure DevOps, Docker, and CI/CD pipelines. Write unit tests with Google Test Framework and manage automated testing in Azure … cameras, 3D point cloud data, and sensor fusion techniques (e.g. Kalman filters). Experience with Nvidia Jetson, Raspberry Pi, and embedded systems. Machine learning for object detection (YOLOv5) and CUDA optimisation. Strong communication and stakeholder engagement skills. Azure DevOps and YAML pipeline scripting. TPBN1_UKTJ More ❯
including advanced OpenCV, Boost, Standard library, and Jsoncpp for efficient data processing and manipulation. Implement machine learning models for object detection, particularly using YOLOv5, and optimize performance using NvidiaCUDA hardware acceleration. Develop, test, and deploy cloud-based applications and simulations, using Azure DevOps, Docker, and cloud-based agents for continuous integration and deployment. Write and maintain unit tests … and customers. Knowledge in Azure DevOps, including setting up and managing CI/CD pipelines with YAML scripting is desirable. Knowledge of OpenCV, Boost, Standard library, Jsoncpp, and NvidiaCUDA hardware acceleration. Knowledge in machine learning, specifically in object detection models like YOLOv5. Experience in writing unit tests using Google Test Framework. More ❯
London, England, United Kingdom Hybrid / WFH Options
InstaDeep Ltd
optimise state-of-the-art algorithms and architectures, ensuring compute efficiency and performance. Low-Level Mastery: Write high-quality Python, C/C++, XLA, Pallas, Triton, and/or CUDA code to achieve performance breakthroughs. Required Skills Understanding of Linux systems, performance analysis tools, and hardware optimisation techniques Experience with distributed training frameworks (Ray, Dask, PyTorch Lightning, etc.) Expertise … with machine learning frameworks (JAX, Tensorflow, PyTorch etc.) Passion for profiling, identifying bottlenecks, and delivering efficient solutions. Highly Desirable Track record of successfully scaling ML models. Experience writing custom CUDA kernels or XLA operations. Understanding of GPU/TPU architectures and their implications for efficient ML systems. Fundamentals of modern Deep Learning Actively following ML trends and a desire … to push boundaries. Example Projects: Profile algorithm traces, identifying opportunities for custom XLA operations and CUDA kernel development. Implement and apply SOTA architectures (MAMBA, Griffin, Hyena) to research and applied projects. Adapt algorithms for large-scale distributed architectures across HPC clusters. Employ memory-efficient techniques within models for increased parameter counts and longer context lengths. What We Offer: Real More ❯
Services • Experience with Computer Vision: Kernel, Hardware Accelerator, TVM, or Code-gen • Experience with Deep Learning: C++ or Python, and AI, Neural Network, Tensorflow, PyTorch, MxNET, Llvm, Compiler, CPU, CUDA, Nvidia, TensorRT, TPU, Cluster Management, High Performance Computing, or Optimization Amazon is an equal opportunities employer. We believe passionately that employing a diverse workforce is central to our success. More ❯
URDF, CAD formats Strong Programming S kills: Experience with the full software development lifecycle. Proficiency in Python and C++ for algorithm development and system integration. Experience with GPU programming (CUDA C++, PyTorch) a plus. Experience with the full software development lifecycle Understanding of System Integration and Engineering for Autonomous Systems : Solid cross-domain knowledge across a wide range of More ❯
environment Strong problem-solving and debugging skills Understanding of software security principles and standard processes Experience with natural language processing (NLP) techniques Familiarity with video processing, Experience with NVIDIACUDA, waveglow, ROCm Experience with Github/Gitlab, CDK, HelmCharts, ArgoCD Experience with Docker Knowledge of Linux Nice to have: Experience with AWS and common services Knowledge of game development More ❯
e.g., Unreal Engine, Unity, custom 3D engines). Proven track record of publications at top-tier conferences (e.g., NeurIPS, CVPR, ICML, ICLR, SIGGRAPH, ECCV). Experience with GPU programming (CUDA) and model optimization for real-time inference (e.g., quantization, pruning, ONNX, TensorRT, custom CUDA kernels). Background in scalable algorithm design for real-time or interactive applications. Experience More ❯
e.g., Unreal Engine, Unity, custom 3D engines). Proven track record of publications at top-tier conferences (e.g., NeurIPS, CVPR, ICML, ICLR, SIGGRAPH, ECCV). Experience with GPU programming (CUDA) and model optimization for real-time inference (e.g., quantization, pruning, ONNX, TensorRT, custom CUDA kernels). Background in scalable algorithm design for real-time or interactive applications. Experience More ❯
e.g., Unreal Engine, Unity, custom 3D engines). Proven track record of publications at top-tier conferences (e.g., NeurIPS, CVPR, ICML, ICLR, SIGGRAPH, ECCV). Experience with GPU programming (CUDA) and model optimization for real-time inference (e.g., quantization, pruning, ONNX, TensorRT, custom CUDA kernels). Background in scalable algorithm design for real-time or interactive applications. Experience More ❯
code on Linux or embedded platforms. Demonstrated ability to deliver production quality, well tested code in collaborative, fast moving environments. Preferred Qualifications Familiarity with GPU or edge AI acceleration (CUDA, TensorRT, Vulkan, or similar). Experience deploying perception pipelines on resource constrained hardware. Publications in multimodal sensing/neural representations/SLAM for robotics or autonomous navigation in journals More ❯
numerical calculation, compilation, algorithm and chip co-design, runtime, or shared memory Strong background in software development using C/C++ and Python Skilled with GPU compute APIs (e.g., CUDA, OpenCL), deep learning frameworks, and compilers Familiarity with AI models, algorithm trends, and translating application requirements into chip-level solutions Experience with GPU acceleration, inference backends, and frameworks such More ❯