HPC Engineer
HPC Engineer
We are seeking an experienced High Performance Computing (HPC) Engineer to design, maintain, and optimise large-scale computing environments that support data-intensive and compute-heavy workloads. You will work closely with researchers, developers, and infrastructure teams to ensure high availability, performance, and scalability of HPC systems.
Key Responsibilities
- Design, deploy, and manage HPC clusters (on-prem, cloud, or hybrid)
- Install, configure, and optimise job schedulers (e.g. Slurm, PBS, LSF)
- Tune system performance for CPU, GPU, memory, storage, and network workloads
- Support users with application optimisation and parallelisation
- Automate system administration using scripting and configuration management tools
- Monitor system health, capacity, and performance
- Troubleshoot hardware, software, and performance issues
- Collaborate on future architecture planning and upgrades
- Maintain documentation and best practices
Required Skills & Experience
- Strong Linux system administration experience
- Hands-on experience with HPC environments and parallel computing
- Knowledge of MPI, OpenMP, and/or CUDA
- Experience with job schedulers (Slurm preferred)
- Familiarity with high-speed interconnects (InfiniBand, Omni-Path)
- Experience with scripting languages (Bash, Python)
- Understanding of performance profiling and optimisation techniques
Desirable Skills
- Experience with GPUs and accelerator-based systems
- Knowledge of cloud HPC (AWS, Azure, GCP)
- Experience with containers (Singularity/Apptainer, Docker)
- Configuration management tools (Ansible, Puppet, Chef)
- Experience supporting scientific or research workloads