HPC Engineer (m/f) - Remote
HPC Engineer (m/f)
Start: ASAP
Duration: 12 months
Location: remote
Tasks:
- Execution and fulfillment of service requests for HPC resources.
- Troubleshooting complex issues in HPC compute environments.
- Documentation of HPC system configurations, procedures, and software implementations.
- Creation and execution of test plans for HPC environments.
- Proactive Development and improvement of current and future environments
- Close collaboration with domain experts, data scientists, and hardware specialists.
- Support day-to-day HPC operations and ensure systems remain stable and performant.
- Address vague or incomplete user requests by applying diagnostic skills.
- Develop HPC architectures with focus on performance, scalability, security.
- Analyze and design complex HPC workloads.
- Evaluate new technologies and prepare technical concepts.
- Actively engage with the AI community to offer expert consultation for the ongoing improvement and strategic development
- Observe application-based AI-Trends in our eco-system and guide about potential use-cases
Skills:
- Minimum 3 years of professional experience in HPC-focused software or systems engineering.
- Strong Linux (RHEL) administration knowledge.
- Programming/Scripting skills: C, C++, Python, Bash, Ansible.
- Knowledge of MPI, OpenMP, CUDA and other parallel computing frameworks.
- Experience with cluster management, job scheduling and high-speed networking (InfiniBand).
- Practical experience installing, configuring, troubleshooting, and performance tuning engineering simulation workloads on HPC clusters.
- ITIL understanding for structured incident and service management.
- Strong analytical capabilities, communication skills, and adaptability