HPC Engineer - Contract via Umbrella - Cambridge/Hybrid

HPC Engineer - Contract via Umbrella - Cambridge/Hybrid

Location: Cambridge, hybrid (ideal 3 days onsite)

Market rate

Description
We're looking for an HPC Engineer to join our team in the United Kingdom in a hybrid working mode (ideal 3 days onsite). In this role, you will help build and operate industry-leading high-performance computing (HPC) capabilities, including application build frameworks, containerized applications and cloud-based services. You will work closely with the scientific community to deliver high-quality HPC services, leveraging automation, infrastructure-as-code and DevOps practices to ensure scalability, reliability and performance in a rapidly evolving HPC landscape.

Responsibilities

  • Design, implement and maintain robust platform infrastructure using Infrastructure as Code (IaC) tools such as Terraform
  • Develop, deliver and operate research computing services and applications
  • Take a Site Reliability Engineering approach to HPC services, managing development, deployment, monitoring and incident response end-to-end
  • Solve complex technical problems related to HPC services and user workflows
  • Drive innovative computational solutions and exploit emerging technologies
  • Administer large-scale cluster and server computing environments and related software (eg, Slurm, LSF, Grid Engine)
  • Apply DevOps practices and agile methodologies for HPC operations
  • Manage virtualized private cloud resources (eg, OpenStack)
  • Implement and administer large-scale parallel filesystems (eg, Weka, GPFS, Lustre)
  • Use configuration management tools (eg, Ansible, Salt, Puppet) for IT operations
  • Develop scripts and tools for HPC and DevOps operations using Bash and Python

Requirements

  • 10+ years of experience operating or engineering large-scale computing environments (HPC, HTC or BC)
  • Strong understanding of Linux system administration, TCP/IP stack and storage subsystems
  • Experience with high-speed networks (eg, InfiniBand)
  • Proven experience with configuration management and automation frameworks
  • Hands-on experience with DevOps processes and agile methodologies
  • Drive innovative computational solutions and exploit emerging technologies
  • Experience in developing and managing relationships with third-party suppliers
  • Scientific degree and/or experience in computationally intensive scientific data analysis
  • Previous experience in large-scale HPC environments (>10,000 cores)

Additional

  • Experience with public cloud infrastructure (AWS, Azure, GCP)
  • Experience managing virtualized private cloud environments (eg, OpenStack)
  • Familiarity with container technologies (LXD, Singularity, Docker, Kubernetes)
  • Development experience with programming languages and tools (Java/C++, Python/Ruby/Perl, SQL)
  • Experience with HashiCorp tools (Terraform, Vault, Consul, Nomad)

Job Details

Company
Robson Bale Ltd
Location
Cambridge, Cambridgeshire, United Kingdom CB1 0
Hybrid / Remote Options
Employment Type
Contract
Salary
GBP Annual
Posted