HPC Engineer - Contract via Umbrella - Cambridge/Hybrid
HPC Engineer - Contract via Umbrella - Cambridge/Hybrid
Location: Cambridge, hybrid (ideal 3 days onsite)
Market rate
Description
We're looking for an HPC Engineer to join our team in the United Kingdom in a hybrid working mode (ideal 3 days onsite). In this role, you will help build and operate industry-leading high-performance computing (HPC) capabilities, including application build frameworks, containerized applications and cloud-based services. You will work closely with the scientific community to deliver high-quality HPC services, leveraging automation, infrastructure-as-code and DevOps practices to ensure scalability, reliability and performance in a rapidly evolving HPC landscape.
Responsibilities
- Design, implement and maintain robust platform infrastructure using Infrastructure as Code (IaC) tools such as Terraform
- Develop, deliver and operate research computing services and applications
- Take a Site Reliability Engineering approach to HPC services, managing development, deployment, monitoring and incident response end-to-end
- Solve complex technical problems related to HPC services and user workflows
- Drive innovative computational solutions and exploit emerging technologies
- Administer large-scale cluster and server computing environments and related software (eg, Slurm, LSF, Grid Engine)
- Apply DevOps practices and agile methodologies for HPC operations
- Manage virtualized private cloud resources (eg, OpenStack)
- Implement and administer large-scale parallel filesystems (eg, Weka, GPFS, Lustre)
- Use configuration management tools (eg, Ansible, Salt, Puppet) for IT operations
- Develop scripts and tools for HPC and DevOps operations using Bash and Python
Requirements
- 10+ years of experience operating or engineering large-scale computing environments (HPC, HTC or BC)
- Strong understanding of Linux system administration, TCP/IP stack and storage subsystems
- Experience with high-speed networks (eg, InfiniBand)
- Proven experience with configuration management and automation frameworks
- Hands-on experience with DevOps processes and agile methodologies
- Drive innovative computational solutions and exploit emerging technologies
- Experience in developing and managing relationships with third-party suppliers
- Scientific degree and/or experience in computationally intensive scientific data analysis
- Previous experience in large-scale HPC environments (>10,000 cores)
Additional
- Experience with public cloud infrastructure (AWS, Azure, GCP)
- Experience managing virtualized private cloud environments (eg, OpenStack)
- Familiarity with container technologies (LXD, Singularity, Docker, Kubernetes)
- Development experience with programming languages and tools (Java/C++, Python/Ruby/Perl, SQL)
- Experience with HashiCorp tools (Terraform, Vault, Consul, Nomad)