ISO 27001, SOC 2). Collaborate with ML/AI Teams Package and deploy large‑language‑model (LLM) training jobs on distributed GPU clusters (Slurm, Ray, Kubeflow, or AWS SageMaker). Optimize model‑serving (Triton, vLLM, TorchServe) for low‑latency, high‑throughput inference. Cost & Performance Optimization Track cloud spend More ❯
london (city of london), south east england, United Kingdom
Selby Jennings
What We're Looking For: 5+ years of experience in HPC environments, including exposure to parallel file-systems (e.g., Lustre, GPFS), batch schedulers (e.g., Slurm, Grid Engine), and high-performance networking (experience with interconnects is a plus) Strong Linux systems administration skills in distributed and high-scale setups Proficiency More ❯
london (city of london), south east england, United Kingdom
Ncounter Technology Recruitment
Willingness to engage in technical discussion and commit to producing high quality code Enthusiasm to learn and grow in your role Any understanding of Slurm and HPC a bonus Developing in Python within an SRE team spanning across the business with project and product work, there is a huge More ❯
Farnborough, Hampshire, United Kingdom Hybrid / WFH Options
Avature
and build performance extrapolation to future generation of HPC/AI hardware. Interact with customers and the Lenovo sales team to offer insight into workload performance characteristics that drive system configurations. Complete competitive comparison studies of different technologies to showcase Intel technology advantages. Develop seller enablement collateral for Lenovo … than one of OpenMP, MPI, CUDA, ROCm, OpenCL, SYCL paradigms. Experience of production HPC environment: large-scale filesystems (ideally Storage Scale), batch scheduling (ideally SLURM) as well as common HPC SW and management tools. Experience with analysis and profiling tools for HPC/AI codes: Intel OneAPI suite (Vtune More ❯
london, south east england, United Kingdom Hybrid / WFH Options
The Engage Partnership Recruitment
hear from you.) 💡 The Stack & Environment: A diverse, modern environment spanning: Linux, Windows, MacOS, Microsoft 365, Azure AD, Intune, Teams, NICE DCV, Nvidia CUDA, Slurm, Jira Service Desk, Terraform, Azure Resource Manager 💡 What We’re Looking For: 2+ years of experience administering HPC infrastructure Hands-on experience with … Infiniband, Slurm, and GPU compute platforms (e.g. CUDA) Proficiency in systems administration and troubleshooting Strong documentation habits and a customer-focused mindset Experience with VDI solutions and monitoring tools 💡 Bonus Points: Familiarity with Jira Service Desk and Terraform scripting Exposure to SSL management, infrastructure-as-code, or cloud database More ❯