Abingdon, Oxfordshire, United Kingdom Hybrid/Remote Options
NES Fircroft
with Java 2D graphics and 3D OpenGL programming. â Experience with scientific computing libraries and frameworks: o Python: NumPy, SciPy, Pandas, TensorFlow (for ML/AI) o C Java: CUDA (for GPU acceleration) o Angular or React o Microservice: Quarkus, Spring Boot, AWS API Gateway o Docker, Kubernetes With over 90 years' combined experience, NES Fircroft (NES) is proud More ❯
Stevenage, Hertfordshire, South East, United Kingdom Hybrid/Remote Options
MBDA
EO Sensor perceives it. You will need Skills in Windows and Linux native software (C/C++), dynamic languages like python, GPU-acceleration and 3D graphics (Open GL, GLSL, CUDA, Vulkan), A strong software background including software architecture design, concurrency, synchronisation, and database design An appreciation of or the desire to learn the physics of the propagation of EM More ❯
Bristol, Avon, South West, United Kingdom Hybrid/Remote Options
MBDA
EO Sensor perceives it. You will need Skills in Windows and Linux native software (C/C++), dynamic languages like python, GPU-acceleration and 3D graphics (Open GL, GLSL, CUDA, Vulkan), A strong software background including software architecture design, concurrency, synchronisation, and database design An appreciation of or the desire to learn the physics of the propagation of EM More ❯
testing, and debugging on Linux-based systems. Desirable Skills Familiarity with UML and tools such as IBM Rhapsody. Experience using MATLAB and Python for data analysis. Knowledge of NVIDIACUDA programming. Exposure to OpenDDS or other middleware communication frameworks. More ❯
testing, and debugging on Linux-based systems. Desirable Skills Familiarity with UML and tools such as IBM Rhapsody. Experience using MATLAB and Python for data analysis. Knowledge of NVIDIACUDA programming. Exposure to OpenDDS or other middleware communication frameworks. More ❯
london, south east england, united kingdom Hybrid/Remote Options
Hudl
how to run video encoding, decoding, and transmission at scale (e.g. HLS, WebRTC, and FFMPEG). Accelerator experience. You've developed GPU kernels and/or ML compilers (e.g., CUDA, OpenCL, TensorRT Plugins, MLIR, TVM, etc). Real-time experience. You've optimized systems to meet strict utilization and latency requirements with tools such as Nvidia NSight. Embedded experience. More ❯
Stevenage, Hertfordshire, England, United Kingdom Hybrid/Remote Options
The One Group
packages Skills in Windows and Linux native software development (C/C++) Experience with dynamic languages such as Python Knowledge of GPU acceleration and 3D graphics pipelines (OpenGL, GLSL, CUDA, Vulkan) Strong software engineering foundations: software architecture, concurrency, synchronisation, database design Appreciation of or interest in learning physics related to EM radiation propagation, coherent phenomena, and thermal analysis Creative More ❯
Software environment. Key Skillset Strong C++ knowledge Knowledge of Rhapsody UML Competent with MS word, for reviewing and updating technical documentation. Experience of RTC/EWM would be beneficial CUDA experience would be beneficial Due to the nature of this project the right behaviours are important can-do attitude, proactive & adaptable and strong communicator. More ❯
Bristol, Avon, South West, United Kingdom Hybrid/Remote Options
Certain Advantage
Skillset/experience required: Strong C++ knowledge Knowledge of Rhapsody UML Competent with MS word, for reviewing and updating technical documentation. Experience of RTC/EWM would be beneficial CUDA experience would be beneficial More ❯
oxford district, south east england, united kingdom
Ellison Institute of Technology
Expertise At the regular level Extensive experience using HPC clusters (or cloud computing) in scientific or research settings. Proficiency in Linux system administration, networking, and parallel computing (MPI, OpenMP, CUDA, or ROCm). Experience with using HPC job schedulers (Slurm preferred) and parallel file systems (Lustre, BeeGFS, GPFS). At the senior level: Extensive experience designing, deploying, and managing … HPC clusters (or cloud computing) in scientific or research settings. Strong proficiency in Linux system administration, networking, and parallel computing (MPI, OpenMP, CUDA, or ROCm). Extensive expertise with administering HPC job schedulers (Slurm preferred) and parallel file systems (Lustre, BeeGFS, GPFS). At all levels: Familiarity with containerization, workflow automation, and orchestration tools used in bioinformatics and AI More ❯
storage systems into the existing datacenter environment. Collaborate with DevOps/Platform teams to validate cluster orchestration (Kubernetes, Slurm, Bright Cluster Manager, or equivalents). Validate firmware, drivers, NCCL, CUDA libraries, and container environments for production readiness. Deployment & Delivery Oversight Provide technical leadership across the full deployment life cycle. Partner with datacenter operations to ensure correct rack layouts, cabling … HGX/SuperPod architectures. Deep knowledge of InfiniBand and high-performance networking architectures. Experience with cluster orchestration: Kubernetes , Slurm, PBS, or similar. Familiarity with AI/ML workload requirements, CUDA, Docker/OCI containers, and NVIDIA software stacks (NCCL, CUDA Toolkit). Comfort with Linux systems engineering, hardware validation, and troubleshooting across compute/network layers. Soft Skills More ❯
london, south east england, united kingdom Hybrid/Remote Options
Synthesia
identify high-impact initiatives and push the boundaries of model performance. You will work on re-implementing models in an efficient manner by using PyTorch and underlying technologies like CUDA/Triton, Torch compilation, etc. This would include: Evaluating, profiling and optimising compute resource usage (e.g., Hopper & Blackwell GPUs) for cost and time efficiency at training and inference times … Developing customized efficient solutions for inference pipelines (CUDA/Triton kernels) as well as Introducing or enhancing tooling for achieving optimal computational performance (e.g. DL compilers, ONNX, TensorRT) Driving the adoption of best practices for large-model training, including checkpointing, gradient accumulation, and memory optimisation among others Introducing or enhancing tooling for distributed training, performance monitoring, and logging (e.g. … background in Computer Science/Engineering and 3+ years of industry experience. (PhD preferred) You have worked on optimising large models for over 2 years You have experience developing CUDA/Triton kernels and optimizing models with DL compilers (torch.compile) You have great coding skills in Python and C++ and you care about writing clean, and efficient code You More ❯
*Must be Sole British National for this role* This is an exciting opportunity to join a cutting-edge technology consultancy working at the forefront of innovation in defence and homeland security. As a Machine Learning Consultant, you will play a More ❯
Melbourn, Royston, Hertfordshire, England, United Kingdom
Lynx Recruitment Ltd
*Must be Sole British National for this role* This is an exciting opportunity to join a cutting-edge technology consultancy working at the forefront of innovation in defence and homeland security. As a Machine Learning Consultant, you will play a More ❯
london, south east england, united kingdom Hybrid/Remote Options
Mercor
Role Overview Mercor is engaging advanced CUDA experts who specialize in GPU kernel optimization, performance profiling, and numerical efficiency. These professionals possess a deep mental model of how modern GPU architectures execute deep learning workloads. They are comfortable translating algorithmic concepts into finely tuned kernels that maximize throughput while maintaining correctness and reproducibility, 2) Key Responsibilities Develop, tune, and … benchmark CUDA kernels for tensor and operator workloads. Optimize for occupancy, memory coalescing, instruction-level parallelism, and warp scheduling. Profile and diagnose performance bottlenecks using Nsight Systems, Nsight Compute, and comparable tools. Report performance metrics, analyze speedups, and propose architectural improvements. Collaborate asynchronously with PyTorch Operator Specialists to integrate kernels into production frameworks. Produce well-documented, reproducible benchmarks and … performance write-ups. 3) Ideal Qualifications Deep expertise in CUDA programming, GPU architecture, and memory optimization. Proven ability to achieve quantifiable performance improvements across hardware generations. Proficiency with mixed precision, Tensor Core usage, and low-level numerical stability considerations. Familiarity with frameworks like PyTorch, TensorFlow, or Triton (not required but beneficial). Strong communication skills and independent problem-solving More ❯
functions in C ATen. Build and validate Python bindings with correct gradient propagation and test coverage. Create "golden" reference implementations in eager mode for correctness validation. Collaborate asynchronously with CUDA or systems engineers who handle low-level kernel optimization. Profile, benchmark, and report performance trends at the operator and graph level. Document assumptions, APIs, and performance metrics for reproducibility. … plus. 4) More About the Opportunity Ideal for contractors who enjoy building clean, high-performance abstractions in deep learning frameworks. Work is asynchronous, flexible, and outcome-oriented. Collaborate with CUDA optimization specialists to integrate and validate kernels. Projects may involve primitives used in state-of-the-art AI models and benchmarks. 5) Compensation & Contract Terms Typical range More ❯