role Remote £550 Inside ir35 6 Months contract Key Skills needed - Design/implementing Unix/Linux system and services open-source solutions and performance tuning. - HPC technologies: Lustre, Slurm - Configuration systems such as Ansible and Terraform - Unix/Linux scripting. - Networking: TCP/IP, DHCP, VLANs, spanning tree protocol, link aggregation for performance (MTU settings) and reliability requirements. More ❯
role Remote£550 Inside ir35 6 Months contract Key Skills needed - Design/implementing Unix/Linux system and services open-source solutions and performance tuning.- HPC technologies: Lustre, Slurm- Configuration systems such as Ansible and Terraform- Unix/Linux scripting.- Networking: TCP/IP, DHCP, VLANs, spanning tree protocol, link aggregation for performance (MTU settings) and reliability requirements. More ❯
oxford district, south east england, united kingdom
Ellison Institute of Technology
Computing Facility, the HPC Engineer will design, deploy, and optimise systems that enable large-scale data processing, AI-driven analytics, and simulation workloads across. For example deploying Kubernetes and Slurm to enable real-time data analysis from instruments, MLOps, or scientific workflow managers. We will be hiring either at the regular or senior level, depending on the applicant's … computational research workloads. Evaluate and integrate advanced technologies including GPU/TPU acceleration, high-speed interconnects, and parallel file systems. Manage HPC environments, including Linux-based clusters, schedulers (e.g., Slurm), and high-performance storage systems (e.g., Lustre, BeeGFS, GPFS). Implement robust monitoring, fault-tolerance, and capacity management for high availability and reliability. Develop automation scripts and tools (Python … or cloud computing) in scientific or research settings. Proficiency in Linux system administration, networking, and parallel computing (MPI, OpenMP, CUDA, or ROCm). Experience with using HPC job schedulers (Slurm preferred) and parallel file systems (Lustre, BeeGFS, GPFS). At the senior level: Extensive experience designing, deploying, and managing HPC clusters (or cloud computing) in scientific or research settings. More ❯
to NVIDIA reference architectures (NVAIE, Base Command, DGX SuperPod specs, etc.). Cluster Integration & Validation Define and execute validation test plans for GPU cluster performance, resilience, networking throughput, and workload behaviour. Oversee integration of GPU nodes, networking, and storage systems into the existing datacenter environment. Collaborate with DevOps/Platform teams to validate cluster orchestration (Kubernetes, Slurm, Bright … Cluster Manager, or equivalents). Validate firmware, drivers, NCCL, CUDA libraries, and container environments for production readiness. Deployment & Delivery Oversight Provide technical leadership across the full deployment life cycle. Partner with datacenter operations to ensure correct rack layouts, cabling, airflow and power design. Support delivery teams during build-out phases, ensuring the design is executed correctly. Participate in factory … on understanding of GPU interconnects (NVLink/NVSwitch) and DGX/HGX/SuperPod architectures. Deep knowledge of InfiniBand and high-performance networking architectures. Experience with cluster orchestration: Kubernetes , Slurm, PBS, or similar. Familiarity with AI/ML workload requirements, CUDA, Docker/OCI containers, and NVIDIA software stacks (NCCL, CUDA Toolkit). Comfort with Linux systems engineering More ❯
Stevenage, Hertfordshire, South East, United Kingdom
Anson Mccade
scripting, particularly Bash, Python, and at least one other language. Clustering: Experience with clustered environments and cluster orchestration tools. Storage: Experience with clustered, parallel file systems (e.g., Lustre). Workload Management: Experience managing batch scheduling systems (PBS Pro, Slurm, SGE/UGE, etc.). HPC Knowledge: Knowledge of HPC management systems (e.g., Bright). Networking/Storage Admin More ❯