other high-performance media/signal-processing experience (broadcast, streaming, game engines, AR/VR). SIMD/vectorization (SSE/AVX/NEON) and/or GPU compute (CUDA, Metal, Vulkan, DirectCompute) for acceleration. Cross-platform build & packaging (CMake, cross-compilation toolchains, SDK distribution). Please get in touch with to hear more about this incredible position More ❯
and deployment pipeline to accelerate model iteration and improve performance. Qualifications PhD in CS/CE/EE, or equivalent, in industry experience Deep knowledge of PyTorch Experience with Cuda or Triton language for writing custom ops Knowledge of model training framework (e.g. PyTorch Lightning) In-depth knowledge of transformer architecture and ways to accelerate the training and inference More ❯
including differentiable systems and backpropagation techniques, beyond just neural networks. Strong mathematical background. Proficiency in programming languages and frameworks such as: PyTorch or TensorFlow Python C/C++ and CUDA (ideally) Fluent in English Minimum of 2 years of AI development experience Preferably, experience applying AI to 3D graphics Parallaxter is part of the V-Nova Group, a London More ❯
/CD pipelines using GitHub Actions . Experience with analytics platforms like Google Analytics and business intelligence tools like Tableau or Power BI. Knowledge of GPU-accelerated computing with CUDA is highly desirable. Excellent problem-solving skills and the ability to thrive in a fast-paced, high-intensity startup environment. 🌟 Cultural Fit - Intensity Required Ultralytics is a high-performance More ❯
/CD pipelines using GitHub Actions . Experience with analytics platforms like Google Analytics and business intelligence tools like Tableau or Power BI. Knowledge of GPU-accelerated computing with CUDA is highly desirable. Excellent problem-solving skills and the ability to thrive in a fast-paced, high-intensity startup environment. 🌟 Cultural Fit - Intensity Required Ultralytics is a high-performance More ❯
/CD pipelines using GitHub Actions . Experience with analytics platforms like Google Analytics and business intelligence tools like Tableau or Power BI. Knowledge of GPU-accelerated computing with CUDA is highly desirable. Excellent problem-solving skills and the ability to thrive in a fast-paced, high-intensity startup environment. 🌟 Cultural Fit - Intensity Required Ultralytics is a high-performance More ❯
london (city of london), south east england, united kingdom
Ultralytics
/CD pipelines using GitHub Actions . Experience with analytics platforms like Google Analytics and business intelligence tools like Tableau or Power BI. Knowledge of GPU-accelerated computing with CUDA is highly desirable. Excellent problem-solving skills and the ability to thrive in a fast-paced, high-intensity startup environment. 🌟 Cultural Fit - Intensity Required Ultralytics is a high-performance More ❯
/CD pipelines using GitHub Actions . Experience with analytics platforms like Google Analytics and business intelligence tools like Tableau or Power BI. Knowledge of GPU-accelerated computing with CUDA is highly desirable. Excellent problem-solving skills and the ability to thrive in a fast-paced, high-intensity startup environment. 🌟 Cultural Fit - Intensity Required Ultralytics is a high-performance More ❯
as PyTorch, TensorFlow, ONNX Knowledge of LLM architectures and inference optimization techniques (e.g., batching, quantization) Experience deploying scalable, reliable, real-time model serving systems (Optional) GPU architecture understanding or CUDA programming experience The compensation range for this role is $190,000 - $240,000. At Perplexity, we have experienced significant growth since launching the world's first conversational answer engine More ❯
of model architectures like transformers and CNNs. Hands-on experience with model optimization (i.e. quantization, pruning) and model deployment frameworks such as TensorRT, ONNX Runtime, and OpenVINO. Proficiency with CUDA programming and optimizing code for GPU acceleration. Strong background in MLOps practices, including CI/CD using GitHub Actions and containerization with Docker. Excellent problem-solving skills and the More ❯
of model architectures like transformers and CNNs. Hands-on experience with model optimization (i.e. quantization, pruning) and model deployment frameworks such as TensorRT, ONNX Runtime, and OpenVINO. Proficiency with CUDA programming and optimizing code for GPU acceleration. Strong background in MLOps practices, including CI/CD using GitHub Actions and containerization with Docker. Excellent problem-solving skills and the More ❯
of model architectures like transformers and CNNs. Hands-on experience with model optimization (i.e. quantization, pruning) and model deployment frameworks such as TensorRT, ONNX Runtime, and OpenVINO. Proficiency with CUDA programming and optimizing code for GPU acceleration. Strong background in MLOps practices, including CI/CD using GitHub Actions and containerization with Docker. Excellent problem-solving skills and the More ❯
of model architectures like transformers and CNNs. Hands-on experience with model optimization (i.e. quantization, pruning) and model deployment frameworks such as TensorRT, ONNX Runtime, and OpenVINO. Proficiency with CUDA programming and optimizing code for GPU acceleration. Strong background in MLOps practices, including CI/CD using GitHub Actions and containerization with Docker. Excellent problem-solving skills and the More ❯
london (city of london), south east england, united kingdom
Ultralytics
of model architectures like transformers and CNNs. Hands-on experience with model optimization (i.e. quantization, pruning) and model deployment frameworks such as TensorRT, ONNX Runtime, and OpenVINO. Proficiency with CUDA programming and optimizing code for GPU acceleration. Strong background in MLOps practices, including CI/CD using GitHub Actions and containerization with Docker. Excellent problem-solving skills and the More ❯
image or video captioning, speech-to-text generation. Bonus: Publications in top-tier venues demonstrating your expertise in multimodal AI research. Bonus: Experience in writing efficient GPU kernels using CUDA, optimising performance for multimodal tasks. This role is perfect for you if you: Have a deep passion for machine learning and its potential to impact various industries through multimodal More ❯
specialise in specific areas of the system based on their skills and areas of interest. The main application software is written in standard and multi-threaded C++ with some CUDA for processing. The QT framework is used for the GUI. The software is performance critical as it is a real-time processing system that must respond to user input More ❯
Company: Qualcomm Technologies International Ltd Job Area: Engineering Group, Engineering Group > Software Engineering General Summary: General Summary: As a leading technology innovator, Qualcomm pushes the boundaries of what's possible to enable next-generation gaming, XR, and AI experiences. Qualcomm More ❯
not required. Below is a detailed breakdown of all the technologies we use. - Backend: Python - Frontend: Typescript and React - Kubernetes for deployment - GCP for underlying infrastructure - Machine Learning: PyTorch, CUDA, Ray We encourage people from all backgrounds, cultures and skill levels to apply. It is okay to not meet all requirements listed as we are looking for individuals who More ❯
level position 5+ years' experience in software development. Good development skills in cloud visualization applications where knowledge is key. Computer Graphics WebGL OpenGL HTML5 MEAN stack. Java/C++ CUDA Augmented/Virtual Reality Game Engines Video streaming a plus. Please, get in touch to discuss and apply for this exciting role More ❯
deploying machine learning onto a range of hardware from resource constrained embedded systems through to edge computing is desirable. As is any knowledge of GPU programming languages and frameworks (CUDA, ROCm, etc). Your future colleagues will be similarly highly skilled, with experience across industry and the drive to innovate. You will find yourself in a low-management work More ❯
deploying machine learning onto a range of hardware from resource constrained embedded systems through to edge computing is desirable. As is any knowledge of GPU programming languages and frameworks (CUDA, ROCm, etc). Your future colleagues will be similarly highly skilled, with experience across industry and the drive to innovate. You will find yourself in a low-management work More ❯
experience in Linux system installation, performance tuning, and troubleshooting Expertise in troubleshooting distributed GPU workloads Deep knowledge around GPU optimization and performance Proficiency in Python scripting and automation frameworks CUDA or C/C++ experience is a plus Experience with NVIDIA technologies beyond CUDA, such as NCCL, GPUDirect RDMA, and NVLink Familiarity with configuration management tools (e.g. Salt More ❯
optimise state-of-the-art algorithms and architectures, ensuring compute efficiency and performance. Low-Level Mastery: Write high-quality Python, C/C++, XLA, Pallas, Triton, and/or CUDA code to achieve performance breakthroughs. Required Skills Understanding of Linux systems, performance analysis tools, and hardware optimisation techniques Experience with distributed training frameworks (Ray, Dask, PyTorch Lightning, etc.) Expertise … with machine learning frameworks (JAX, Tensorflow, PyTorch etc.) Passion for profiling, identifying bottlenecks, and delivering efficient solutions. Highly Desirable Track record of successfully scaling ML models. Experience writing custom CUDA kernels or XLA operations. Understanding of GPU/TPU architectures and their implications for efficient ML systems. Fundamentals of modern Deep Learning Actively following ML trends and a desire … to push boundaries. Example Projects: Profile algorithm traces, identifying opportunities for custom XLA operations and CUDA kernel development. Implement and apply SOTA architectures (MAMBA, Griffin, Hyena) to research and applied projects. Adapt algorithms for large-scale distributed architectures across HPC clusters. Employ memory-efficient techniques within models for increased parameter counts and longer context lengths. What We Offer: Real More ❯
the best use of our HPC resources. Integrate ML models into production systems where latency matters. Work across a mix of programming languages: C/C++/Python/CUDA and other low-level GPU languages. Build large scale ML systems that are observable, performant, and flexible. Help improve productivity by reducing the iteration cycle time on research. Other … Python and/or C++ Proficiency in Pytorch, JAX, Tensorflow or other DL library. Ability to thrive in a collaborative, team-oriented environment Expertise in GPU or Accelerator programming (CUDA, Triton, SYCL, ROCm or equivalent) Experience building ML systems at large scale (hundreds of TBs of training data, low latency or high throughput inference requirements) Excellent written and verbal More ❯
systems-level programming: memory management, threading, profiling Experience debugging complex issues in large, multi-threaded or real-time systems Comfortable optimising across CPU/GPU boundaries (e.g. PyTorch, TensorRT, CUDA) Passion for clean code, API design, and maintainable architecture Proven track record of delivering production-grade systems in fast-moving teams Desirable: Experience with ROS 2, DDS, or other … about working on real-world robotics in a collaborative, deeply technical environment-we encourage you to apply today. Key words: Senior Software Engineer, Robotics, C++, Python, ROS 2, DDS, CUDA, PyTorch, TensorRT, Real-Time Systems, Embedded Systems, Low Latency, CI/CD, API Design, Linux Kernel, Multithreading, GPU Optimisation, Robotics Engineer, Autonomous Systems, London Engineering Jobs, Robotics Startups, High More ❯