language, vision and other modalities, machine learning for molecules and proteins (ideally with some background in chemistry and biological sciences) . Lower-level programming for hardware efficiency, e.g. C CUDA/Triton. Practical familiarity with hardware capabilities for deep learning - threads, caches, vector & matrix engines, data dependencies, bus widths and throttling. Practical familiarity with software stacks for deep learning More ❯
or create insights, that's a plus. Deeper systems knowledge. Extraexperience with any of the following would be an asset: developing GPU kernels and/or ML compilers (e.g. CUDA, OpenCL, TensorRT Plugins, MLIR, TVM, etc); optimizing systems to meet strict utilization and latency requirements with tools such as Nvidia NSight; and/or you've worked with embedded More ❯
London, South East, England, United Kingdom Hybrid / WFH Options
Octad Recruitment Ltd
Desirable skills: Experience in solving non-linear least square problems Experience with Computer Vision Experience in UI development e.g. ImGui Understanding of multithreading techniques Experience with GPU programming e.g. CUDA Experience with a messaging framework, e.g. NATS, RabbitMQ Experience working in and configuring cloud environments (e.g. AWS, Azure, GCP) Experience working with software containers (Docker, Podman) and container orchestration More ❯
Inference experience for high-throughput model serving - Proven ability to work on air-gapped systems with no external package repositories - Experience with GPU orchestration (NVIDIA A100/H100) and CUDA optimisation - Python expertise with offline dependency management and local package mirrors Technical Stack (All On-Premises) Models: Llama 3, Mistral, Qwen (locally hosted) Vector Stores: Chroma, FAISS, Milvus Orchestration More ❯
Inference experience for high-throughput model serving - Proven ability to work on air-gapped systems with no external package repositories - Experience with GPU orchestration (NVIDIA A100/H100) and CUDA optimisation - Python expertise with offline dependency management and local package mirrors Technical Stack (All On-Premises) Models: Llama 3, Mistral, Qwen (locally hosted) Vector Stores: Chroma, FAISS, Milvus Orchestration More ❯
you will: Design and write high-performant and scalable software for training. Understand architectural modifications and design choices and their effects on training throughput and quality. Write low-level CUDA, triton kernels to squeeze every last bit of performance from our accelerators. Research, implement, and experiment with ideas on our supercompute and data infrastructure. Learn from and work with … if you have: Extremely strong software engineering skills. Proficiency in Python and related ML frameworks such as JAX, Pytorch and XLA/MLIR. Experience writing kernels for GPUs using CUDA, triton, etc Experience using large-scale distributed training strategies. Familiarity with autoregressive sequence models, such as Transformers. Bonus : paper at top-tier venues (such as NeurIPS, ICML, ICLR, AIStats More ❯