Strong Linux system expertise, with good experience with distribution management (Red Hat, ) HA clusters Strong knowledge of HPC systems and underlying components Parallel filesystems (Lustre, GPFS, ) High-speed network (Infiniband, OmniPath, Slingshot ) DevOps: Ansible, Git, Puppet, Bash or Python scripting, Parallel computing and development software stacks Big Data databases: Elastic/OpenSearch Monitoring tools and dashboards: Prometheus, Grafana Docker containers More ❯
rack planning, cabling layouts, and airflow optimization Install and test hardware including servers, switches, PDUs, and interconnects Perform hardware diagnostics, firmware updates, and performance validation Support high-speed interconnects (InfiniBand, Ethernet) and fiber/copper cabling Follow best practices for data center operations, including asset tracking and documentation Work under tight timelines with precision and attention to quality Required Experience More ❯
London, South East, England, United Kingdom Hybrid / WFH Options
Solutions Through Knowledge
infrastructure, spanning hardware, software, and storage domains. The ideal candidate will be highly proficient in the following areas: Hardware Advanced multi-GPU system design High-speed interconnects: PCIe, NVLink, InfiniBand Thermal and power efficiency modelling Expert-level knowledge of NVIDIA Blackwell infrastructure Optimizing CPU/GPU ratios in HPC environments Software AI/HPC workload simulation and performance analysis Precision More ❯