Permanent InfiniBand Jobs in London

8 of 8 Permanent InfiniBand Jobs in London

Solutions Engineer AI Service Provider - UK

London, United Kingdom
Hybrid / WFH Options
Cisco Systems, Inc
PyTorch, or Hugging Face Transformers. Good understanding of programming/scripting: (e.g., Python, Go) for customizing solutions, creating scripts, or automating tasks. Experience with AI relevant infrastructure, including Networking (InfiniBand and RoCE), Storage (FC, IP and scale out) and AI accelerators (GPUs etc). Excellent presentation skills - ability to value-sell and deliver engaging workshops to both technical and non More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Solutions Engineer Service Provider & AI

London, United Kingdom
Hybrid / WFH Options
Cisco Systems, Inc
PyTorch, or Hugging Face Transformers. Good understanding of programming/scripting: (e.g., Python, Go) for customizing solutions, creating scripts, or automating tasks. Experience with AI relevant infrastructure, including Networking (InfiniBand and RoCE), Storage (FC, IP and scale out) and AI accelerators (GPUs etc). Excellent presentation skills - ability to value-sell and deliver engaging workshops to both technical and non More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Operations Engineer (m/f/d)

London, United Kingdom
TAIGA Cloud Limited
YOUR QUALIFICATIONS: 3+ years of experience in infrastructure operations, system administration, or technical support, ideally within HPC or GPU-accelerated environments. Strong troubleshooting skills with high-performance networking technologies (InfiniBand, RDMA, or similar). Familiarity with NVIDIA GPU technology, HPC architectures, storage solutions and high-performance file systems. Hands-on experience with monitoring tools and system management for large-scale More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Staff DevOps Engineer Research Infrastructure Operations

London, United Kingdom
Hybrid / WFH Options
DeepL GmbH
at scale Experienced in Linux performance benchmarking, tuning, and troubleshooting Familiarity with distributed storage solutions like Lustre and Ceph Knowledgeable in networking technologies and protocols, including Ethernet and ideally Infiniband Proactive and solution-oriented mindset Excellent problem-solving skills Initiative-driven and able to take ownership What we offer Diverse and internationally distributed team : joining our team means becoming part More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Machine Learning Performance Engineer (London)

Highgate, Greater London, UK
Jane Street
CUTLASS, CUB, Thrust, cuDNN and cuBLAS Intuition about the latency and throughput characteristics of CUDA graph launch, tensor core arithmetic, warp-level synchronization and asynchronous memory loads Background in Infiniband, RoCE, GPUDirect, PXN, rail optimisation and NVLink, and how to use these networking technologies to link up GPU clusters An understanding of the collective algorithms supporting distributed GPU training in More ❯
Employment Type: Full-time
Posted:

Datacenter Deployment Engineer

London, United Kingdom
Hybrid / WFH Options
Nscale Ltd
ensure compatibility and efficiency. Significant previous datacenter experience in deployment, design or operations. Familiarity with CMDB tooling such as NetBox. Nice to have: Working knowledge and experience of using Infiniband fabrics Working knowledge of fat tree or rail-optimised designs for AI workloads Ability to perform performance level diagnostics on AI fabric Please Note: This role will require 50%+ More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Senior HPC Support Engineer

London, United Kingdom
Hybrid / WFH Options
Nscale Ltd
environments. This is a hands-on role requiring deep technical acumen, exceptional problem-solving ability, and comfort working across a diverse set of technologies including GPUs (NVIDIA and AMD), InfiniBand networking, and orchestration systems like Slurm. What You'll Be Doing Provide expert-level support for customer HPC and AI workloads running in production. Troubleshoot complex system-level issues across … with system-level debugging, including kernel modules and network interfaces. Experience with GPU compute platforms (NVIDIA and/or AMD) and associated libraries. Familiarity with MPI libraries (e.g., OpenMPI), InfiniBand, and high-speed Ethernet networking. Solid Linux administration skills and troubleshooting experience. Working knowledge of HPC container runtimes (e.g., Singularity, Apptainer). Exposure to provisioning and automation tools (e.g., Ansible More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Partner Solutions Architect

London, United Kingdom
Hybrid / WFH Options
Fluidstack
efforts: Manage multiple concurrent infrastructure validation cycles, define and track KPIs, and build repeatable processes. Monitor and troubleshoot distributed systems: Perform end-to-end diagnostics across compute, fabric (e.g., InfiniBand), and storage layers. Stay current with cutting-edge trends in AI infrastructure such as NVIDIA Hopper/Blackwell architectures, model-serving patterns, and emerging ML system designs and disseminate insights … balancing deep engineering discussions with high-level business context. Qualifications: Technical depth in GPU-cloud infrastructure: Experience with large-scale GPU clusters using Kubernetes and/or SLURM over InfiniBand; deep understanding of the NVIDIA driver stack, NCCL performance tuning, and benchmarking. Strong customer or partner-facing experience: Able to bridge technical and business conversations, explain complex systems to mixed More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted: