London, England, United Kingdom Hybrid / WFH Options
PhysicsX
scaling and optimising ML models, training and serving foundation models at scale (federated learning a bonus); distributed computing frameworks (e.g., Spark, Dask) and high-performance computing frameworks (MPI, OpenMP, CUDA, Triton); cloud computing (on hyper-scaler platforms, e.g., AWS, Azure, GCP); building machine learning models and pipelines in Python, using common libraries and frameworks (e.g., NumPy, SciPy, Pandas, PyTorch More ❯
London, England, United Kingdom Hybrid / WFH Options
NVIDIA
/Digital Twins. Proficiency in deploying AI models and optimizing inference using TensorRT, ONNX Runtime, Triton, or TensorRT-LLM is a plus. Proven experience implementing and optimizing workloads with CUDA and Nsight Tools. Experience with high performance networking technologies, e.g. DPDK, DOCA, RMDA, RoCEv2 is a plus. Published record of thought leadership in a technical area or industry segment. More ❯
JIRA, Git, Jenkins, JAVA, bash, batch files, TestRail. • 2D and 3D Geometrical modelling experience; Geometrical APIs or toolkits including CGAL. • Multithreading and parallel programming experience; OpenMP; GPU programming using CUDA or OpenCL. • Scripting of mathematical or geological problems; Excel, MATLAB, Python. Knowledge of any/several of the following will be ideal: • Seismic processing and attribute analysis. • Modelling of More ❯
at a leading technology company. Strong expertise in algorithms, data structures, multivariate calculus, and linear algebra. Proficient in Python, TensorFlow, PyTorch, or similar languages and frameworks, with experience writing CUDA kernels and profiling GPU code a plus. Excellent communication skills, with the ability to work effectively in cross-functional teams and present complex ideas to both technical and non More ❯
London, England, United Kingdom Hybrid / WFH Options
PhysicsX Ltd
scaling and optimising ML models, training and serving foundation models at scale (federated learning a bonus); distributed computing frameworks (e.g., Spark, Dask) and high-performance computing frameworks (MPI, OpenMP, CUDA, Triton); cloud computing (on hyper-scaler platforms, e.g., AWS, Azure, GCP); building machine learning models and pipelines in Python, using common libraries and frameworks (e.g., NumPy, SciPy, Pandas, PyTorch More ❯
integration with traditional infrastructure is critical. Ways To Stand Out From The Crowd Knowledge in InfiniBand and Artificial Intelligence infrastructure. Hands-on experience with NVIDIA systems/SDKs (e.g., CUDA), NVIDIA Networking technologies (e.g., DPU, RoCE, InfiniBand), ARM CPU solutions, coupled with proficiency in C/C++ programming, parallel programming, and GPU development. Knowledge of DevOps/MLOps technologies More ❯
London, England, United Kingdom Hybrid / WFH Options
InstaDeep
learning models across diverse hardware platforms (GPU/TPU) and optimising system performance under heavy load. Low-Level Optimisation: Write efficient Python, C/C++, XLA, Pallas, Triton, or CUDA code to achieve performance breakthroughs. ML Systems Design: Architect robust distributed systems for training, deployment, and monitoring, ensuring computational efficiency and scalability. Data Pipeline Automation: Develop automated pipelines for …/or PyTorch) Passion for profiling, identifying bottlenecks, and delivering efficient solutions. Fundamentals of modern Deep Learning Desired Skills Track record of successfully scaling ML models. Experience writing custom CUDA kernels or XLA operations. Understanding of GPU/TPU architectures and their implications for efficient ML systems. Representative projects Profile algorithm, identifying opportunities for custom XLA/CUDAMore ❯
approach to AI implementation. Effective communication and collaboration skills in cross-functional teams . Preferred Skills High-Performance Computing (HPC) and AI workloads for large-scale enterprise solutions. NVIDIACUDA, cuDNN, TensorRT experience for deep learning acceleration. Big Data platforms (Hadoop, Spark) for AI-driven analytics in professional services. Pls share CV at payal.c@hcltech.com More ❯
approach to AI implementation. Effective communication and collaboration skills in cross-functional teams . Preferred Skills High-Performance Computing (HPC) and AI workloads for large-scale enterprise solutions. NVIDIACUDA, cuDNN, TensorRT experience for deep learning acceleration. Big Data platforms (Hadoop, Spark) for AI-driven analytics in professional services. Pls share CV at payal.c@hcltech.com More ❯
London, England, United Kingdom Hybrid / WFH Options
Canonical
internationally twice a year for company events up to two weeks long Nice-to-have skills Experience with LXC/LXD Experience with AI/ML and/or CUDA/OpenVINO Knowledge of system and language package managers internals What we offer colleagues We consider geographical location, experience, and performance in shaping compensation worldwide. We revisit compensation annually More ❯
e.g., Unreal Engine, Unity, custom 3D engines). Proven track record of publications at top-tier conferences (e.g., NeurIPS, CVPR, ICML, ICLR, SIGGRAPH, ECCV). Experience with GPU programming (CUDA) and model optimization for real-time inference (e.g., quantization, pruning, ONNX, TensorRT, custom CUDA kernels). Background in scalable algorithm design for real-time or interactive applications. Experience More ❯
of the mathematical foundations of deep learning, including multivariate calculus, linear algebra, and optimization techniques. Proficient in Python and deep learning frameworks such as TensorFlow and PyTorch. Experience with CUDA kernels and GPU profiling is a plus. Excellent communication skills, with the ability to present complex technical ideas to both technical and non-technical audiences. Knowledge of quantitative finance More ❯
OpenTelemetry) A proactive ownership mindset and the ability to navigate ambiguity Excellent collaboration and communication skills for working effectively with teams and stakeholders Ideally Professional experience GPGPU programming (e.g., CUDA, Triton) for performance optimization Experience building and maintaining widely-used internal or open-source libraries Familiarity with the machine learning development lifecycle and core concepts (e.g., bias-variance tradeoff More ❯
Experience: Proficiency in C++ with a strong focus on memory management, multi-threading, and low-level performance optimizations. Experience with GPU architectures (e.g., NVIDIA, AMD) and programming frameworks like CUDA, OpenCL, and TensorFlow. Understanding of machine learning algorithms, including model training and inference, and how to optimize these for GPU-based computation. Strong knowledge of parallel computing, vectorization, and More ❯
Experience: Proficiency in C++ with a strong focus on memory management, multi-threading, and low-level performance optimizations. Experience with GPU architectures (e.g., NVIDIA, AMD) and programming frameworks like CUDA, OpenCL, and TensorFlow. Understanding of machine learning algorithms, including model training and inference, and how to optimize these for GPU-based computation. Strong knowledge of parallel computing, vectorization, and More ❯
development Experience with software development processes and tools such as Git source code control, profiler, and debugger Effective communication and problem-solving skills Experience with compute languages like HIP, CUDA, OpenCL is a plus ACADEMIC CREDENTIALS: Bachelor’s or Master’s degree in Computer Science, Computer Engineering, Electrical Engineering, or equivalent #LI-RA1 #LI-Remote Benefits offered are described More ❯
of the mathematical foundations of deep learning, including multivariate calculus, linear algebra, and optimization techniques. Proficient in Python and deep learning frameworks such as TensorFlow and PyTorch. Experience with CUDA kernels and GPU profiling is a plus. Excellent communication skills, with the ability to present complex technical ideas to both technical and non-technical audiences. Knowledge of quantitative finance More ❯
of the mathematical foundations of deep learning, including multivariate calculus, linear algebra, and optimization techniques. Proficient in Python and deep learning frameworks such as TensorFlow and PyTorch. Experience with CUDA kernels and GPU profiling is a plus. Excellent communication skills, with the ability to present complex technical ideas to both technical and non-technical audiences. Knowledge of quantitative finance More ❯
. A proactive ownership mindset and the ability to navigate ambiguity. Excellent collaboration and communication skills for working effectively with teams and stakeholders. Ideally Professional experience GPGPU programming (e.g., CUDA, Triton) for performance optimization. Experience building and maintaining widely-used internal or open-source libraries. Familiarity with the machine learning development lifecycle and core concepts (e.g., bias-variance tradeoff More ❯
London, England, United Kingdom Hybrid / WFH Options
InstaDeep Ltd
optimise state-of-the-art algorithms and architectures, ensuring compute efficiency and performance. Low-Level Mastery: Write high-quality Python, C/C++, XLA, Pallas, Triton, and/or CUDA code to achieve performance breakthroughs. Required Skills Understanding of Linux systems, performance analysis tools, and hardware optimisation techniques Experience with distributed training frameworks (Ray, Dask, PyTorch Lightning, etc.) Expertise … with machine learning frameworks (JAX, Tensorflow, PyTorch etc.) Passion for profiling, identifying bottlenecks, and delivering efficient solutions. Highly Desirable Track record of successfully scaling ML models. Experience writing custom CUDA kernels or XLA operations. Understanding of GPU/TPU architectures and their implications for efficient ML systems. Fundamentals of modern Deep Learning Actively following ML trends and a desire … to push boundaries. Example Projects: Profile algorithm traces, identifying opportunities for custom XLA operations and CUDA kernel development. Implement and apply SOTA architectures (MAMBA, Griffin, Hyena) to research and applied projects. Adapt algorithms for large-scale distributed architectures across HPC clusters. Employ memory-efficient techniques within models for increased parameter counts and longer context lengths. What We Offer: Real More ❯
Services Experience with Computer Vision: Kernel, Hardware Accelerator, TVM, or Code-gen Experience with Deep Learning: C++ or Python, and AI, Neural Network, Tensorflow, PyTorch, MxNET, Llvm, Compiler, CPU, CUDA, Nvidia, TensorRT, TPU, Cluster Management, High Performance Computing, or Optimization Amazon is an equal opportunities employer. We believe passionately that employing a diverse workforce is central to our success. More ❯
environment Strong problem-solving and debugging skills Understanding of software security principles and standard processes Experience with natural language processing (NLP) techniques Familiarity with video processing, Experience with NVIDIACUDA, waveglow, ROCm Experience with Github/Gitlab, CDK, HelmCharts, ArgoCD Experience with Docker Knowledge of Linux Nice to have: Experience with AWS and common services Knowledge of game development More ❯
e.g., Unreal Engine, Unity, custom 3D engines). Proven track record of publications at top-tier conferences (e.g., NeurIPS, CVPR, ICML, ICLR, SIGGRAPH, ECCV). Experience with GPU programming (CUDA) and model optimization for real-time inference (e.g., quantization, pruning, ONNX, TensorRT, custom CUDA kernels). Background in scalable algorithm design for real-time or interactive applications. Experience More ❯
e.g., Unreal Engine, Unity, custom 3D engines). Proven track record of publications at top-tier conferences (e.g., NeurIPS, CVPR, ICML, ICLR, SIGGRAPH, ECCV). Experience with GPU programming (CUDA) and model optimization for real-time inference (e.g., quantization, pruning, ONNX, TensorRT, custom CUDA kernels). Background in scalable algorithm design for real-time or interactive applications. Experience More ❯
code on Linux or embedded platforms. Demonstrated ability to deliver production quality, well tested code in collaborative, fast moving environments. Preferred Qualifications Familiarity with GPU or edge AI acceleration (CUDA, TensorRT, Vulkan, or similar). Experience deploying perception pipelines on resource constrained hardware. Publications in multimodal sensing/neural representations/SLAM for robotics or autonomous navigation in journals More ❯