Permanent Slurm Workload Manager Jobs in the Channel Islands

1 of 1 Permanent Slurm Workload Manager Jobs in the Channel Islands

Operations & Support Engineer (HPC)

Guernsey, UK
asobbi
multi-node HPC clusters. Oversee high-speed networking configurations, including InfiniBand (Mellanox), RDMA, and Ethernet fabric tuning for low-latency HPC workloads. Configure and fine-tune HPC schedulers (e.g., Slurm, OpenPBS, SGE) for optimal GPU workload distribution. Implement containerization strategies (Podman, Docker) and orchestration platforms (K3s, Kubernetes) for managing distributed AI/ML workloads. Networking and Infrastructure Support … clusters. Analyze logs, conduct performance profiling, and debug CUDA, MPI, and RDMA-related issues. Work closely with AI/ML research teams, cloud engineers, and enterprise clients to optimize workload performance. Collaboration and Process Improvement Support the ongoing development of internal HPC test environments and customer POCs. Work cross-functionally with Service Desk, Operations, and Service Delivery Management to More ❯
Employment Type: Part-time
Posted: