role Remote £550 Inside ir35 6 Months contract Key Skills needed - Design/implementing Unix/Linux system and services open-source solutions and performance tuning. - HPC technologies: Lustre, Slurm - Configuration systems such as Ansible and Terraform - Unix/Linux scripting. - Networking: TCP/IP, DHCP, VLANs, spanning tree protocol, link aggregation for performance (MTU settings) and reliability requirements. More ❯
role Remote£550 Inside ir35 6 Months contract Key Skills needed - Design/implementing Unix/Linux system and services open-source solutions and performance tuning.- HPC technologies: Lustre, Slurm- Configuration systems such as Ansible and Terraform- Unix/Linux scripting.- Networking: TCP/IP, DHCP, VLANs, spanning tree protocol, link aggregation for performance (MTU settings) and reliability requirements. More ❯
file systems (e.g., Lustre), and HPC tools (e.g., Bright) • Understanding of networking (InfiniBand/Ethernet) and storage platforms (DDN, NetApp, IBM, Dell EMC) • Experience with batch schedulers (PBS Pro, Slurm, SGE/UGE, Microsoft Scheduler) Get in touch for more details More ❯
HPC tools (such as Bright)* Networking knowledge: Mellanox InfiniBand or Ethernet* Experience with storage platforms: DDN, NetApp, IBM, Dell EMC* Familiarity with batch scheduling systems such as PBS Pro, Slurm, SGE/UGE, Microsoft Scheduler It's still worth applying even if you don't meet every requirement. If you have solid Linux knowledge and a passion for developing More ❯
Employment Type: Full-Time
Salary: £50,000 - £60,000 per annum, Inc benefits, OTE
to NVIDIA reference architectures (NVAIE, Base Command, DGX SuperPod specs, etc.). Cluster Integration & Validation Define and execute validation test plans for GPU cluster performance, resilience, networking throughput, and workload behaviour. Oversee integration of GPU nodes, networking, and storage systems into the existing datacenter environment. Collaborate with DevOps/Platform teams to validate cluster orchestration (Kubernetes, Slurm, Bright … Cluster Manager, or equivalents). Validate firmware, drivers, NCCL, CUDA libraries, and container environments for production readiness. Deployment & Delivery Oversight Provide technical leadership across the full deployment life cycle. Partner with datacenter operations to ensure correct rack layouts, cabling, airflow and power design. Support delivery teams during build-out phases, ensuring the design is executed correctly. Participate in factory … on understanding of GPU interconnects (NVLink/NVSwitch) and DGX/HGX/SuperPod architectures. Deep knowledge of InfiniBand and high-performance networking architectures. Experience with cluster orchestration: Kubernetes , Slurm, PBS, or similar. Familiarity with AI/ML workload requirements, CUDA, Docker/OCI containers, and NVIDIA software stacks (NCCL, CUDA Toolkit). Comfort with Linux systems engineering More ❯
Stevenage, Hertfordshire, South East, United Kingdom
Anson Mccade
scripting, particularly Bash, Python, and at least one other language. Clustering: Experience with clustered environments and cluster orchestration tools. Storage: Experience with clustered, parallel file systems (e.g., Lustre). Workload Management: Experience managing batch scheduling systems (PBS Pro, Slurm, SGE/UGE, etc.). HPC Knowledge: Knowledge of HPC management systems (e.g., Bright). Networking/Storage Admin More ❯