Lead Engineer - Software & HPC Engineering
Lead / Senior HPC Engineer
Location: On-site (due to secure, air-gapped systems)
Full-time | 40 hours per week
Are you ready to play a key role in one of the most ambitious technological challenges of our time?
We are a pioneering UK-based deep-tech company developing next-generation solutions at the cutting edge of advanced physics, simulation, and machine learning. Our work is focused on unlocking scalable, clean energy through breakthrough approaches, supported by world-class computational capabilities and innovative engineering.
Alongside our core mission, we collaborate with leading organisations across advanced industries, applying our proprietary simulation tools and technologies to solve complex, high-impact challenges.
This is a rare opportunity to join a highly skilled, mission-driven team working at the forefront of science and engineering innovation.
The Role
We're seeking a Lead HPC Engineer - or an experienced Senior HPC Engineer ready to step up - to take ownership of a large-scale, high-performance computing environment.
You'll support and evolve an HPC cluster of over 10,000 cores, ensuring reliability, performance, and scalability for workloads ranging from single high-precision runs to thousands of parallel simulations.
Working within the Software & HPC Engineering team, you'll collaborate closely with computational scientists, data engineers, and IT specialists to deliver a robust platform that underpins cutting-edge research and development.
Key Responsibilities
- Maintain and optimise HPC hardware, working with external vendors where required
- Manage core system software and ensure platform stability
- Monitor performance, troubleshoot issues, and drive continuous improvements
- Oversee backups of critical data and system configurations
- Schedule and perform maintenance aligned with user activity
- Profile workloads and enhance system efficiency
- Communicate system status, updates, and major issues to stakeholders
- Capture user requirements and contribute to upgrade and capacity planning
- Support procurement processes and vendor negotiations
- Produce clear documentation for both technical teams and end users
- Collaborate across engineering and IT teams on shared infrastructure
Current Environment
You'll be working with a modern HPC stack, including:
- Large-scale multi-vendor server infrastructure (AMD EPYC, Intel Xeon)
- High-speed networking (100Gb LAN) and high-performance storage systems
- Linux-based environments (AlmaLinux, Ubuntu)
- Distributed file systems (Lustre, GlusterFS, NFS)
- HPC tooling including Slurm, Ansible, and monitoring frameworks
- Development ecosystems supporting C++, Fortran, MPI, and Python
About You
Essential:
- Degree in Computer Science (or equivalent experience)
- Strong expertise in Linux, HPC systems, storage, and networking
- Experience with MPI and scientific computing environments (C++, Fortran)
- Familiarity with job schedulers and workload management systems
- Scripting skills (Shell, Python) and version control (Git)
- Ability to design, implement, and support complex HPC systems
- Strong analytical thinking and problem-solving skills
- Excellent communication and collaboration abilities
Desirable:
- Deep expertise in HPC optimisation and performance profiling
- Experience with configuration management tools (e.g. Ansible)
- Knowledge of containerisation (e.g. Singularity, Apptainer)
- Experience working with secure or air-gapped environments
- Familiarity with HPC accounting systems and SQL databases
- Experience supporting and training end users
Rullion celebrates and supports diversity and is committed to ensuring equal opportunities for both employees and applicants.