Slough, England, United Kingdom Hybrid / WFH Options
JR United Kingdom
high-priority incidents and escalations. About You Skills & Experience Proven experience supporting HPC and/or AI workloads in production environments. Strong expertise with Slurmworkloadmanager, including tuning and troubleshooting. Proficiency with system-level debugging, including kernel modules and network interfaces. Experience with GPU compute platforms … paced, ambiguous, high-growth environments. Nice to have Experience with OpenStack and troubleshooting infrastructure in cloud environments. Kubernetes expertise, particularly in HPC or AI workload contexts. Familiarity with distributed file systems and advanced storage configurations. Understanding of GPU virtualization and multi-tenant HPC architecture. Exposure to machine learning frameworks More ❯
Liverpool, England, United Kingdom Hybrid / WFH Options
JR United Kingdom
high-priority incidents and escalations. About You Skills & Experience Proven experience supporting HPC and/or AI workloads in production environments. Strong expertise with Slurmworkloadmanager, including tuning and troubleshooting. Proficiency with system-level debugging, including kernel modules and network interfaces. Experience with GPU compute platforms … paced, ambiguous, high-growth environments. Nice to have Experience with OpenStack and troubleshooting infrastructure in cloud environments. Kubernetes expertise, particularly in HPC or AI workload contexts. Familiarity with distributed file systems and advanced storage configurations. Understanding of GPU virtualization and multi-tenant HPC architecture. Exposure to machine learning frameworks More ❯
Cheltenham, England, United Kingdom Hybrid / WFH Options
JR United Kingdom
high-priority incidents and escalations. About You Skills & Experience Proven experience supporting HPC and/or AI workloads in production environments. Strong expertise with Slurmworkloadmanager, including tuning and troubleshooting. Proficiency with system-level debugging, including kernel modules and network interfaces. Experience with GPU compute platforms … paced, ambiguous, high-growth environments. Nice to have Experience with OpenStack and troubleshooting infrastructure in cloud environments. Kubernetes expertise, particularly in HPC or AI workload contexts. Familiarity with distributed file systems and advanced storage configurations. Understanding of GPU virtualization and multi-tenant HPC architecture. Exposure to machine learning frameworks More ❯
Brighton, England, United Kingdom Hybrid / WFH Options
JR United Kingdom
high-priority incidents and escalations. About You Skills & Experience Proven experience supporting HPC and/or AI workloads in production environments. Strong expertise with Slurmworkloadmanager, including tuning and troubleshooting. Proficiency with system-level debugging, including kernel modules and network interfaces. Experience with GPU compute platforms … paced, ambiguous, high-growth environments. Nice to have Experience with OpenStack and troubleshooting infrastructure in cloud environments. Kubernetes expertise, particularly in HPC or AI workload contexts. Familiarity with distributed file systems and advanced storage configurations. Understanding of GPU virtualization and multi-tenant HPC architecture. Exposure to machine learning frameworks More ❯
Bath, England, United Kingdom Hybrid / WFH Options
JR United Kingdom
high-priority incidents and escalations. About You Skills & Experience Proven experience supporting HPC and/or AI workloads in production environments. Strong expertise with Slurmworkloadmanager, including tuning and troubleshooting. Proficiency with system-level debugging, including kernel modules and network interfaces. Experience with GPU compute platforms … paced, ambiguous, high-growth environments. Nice to have Experience with OpenStack and troubleshooting infrastructure in cloud environments. Kubernetes expertise, particularly in HPC or AI workload contexts. Familiarity with distributed file systems and advanced storage configurations. Understanding of GPU virtualization and multi-tenant HPC architecture. Exposure to machine learning frameworks More ❯
Reading, England, United Kingdom Hybrid / WFH Options
JR United Kingdom
high-priority incidents and escalations. About You Skills & Experience Proven experience supporting HPC and/or AI workloads in production environments. Strong expertise with Slurmworkloadmanager, including tuning and troubleshooting. Proficiency with system-level debugging, including kernel modules and network interfaces. Experience with GPU compute platforms … paced, ambiguous, high-growth environments. Nice to have Experience with OpenStack and troubleshooting infrastructure in cloud environments. Kubernetes expertise, particularly in HPC or AI workload contexts. Familiarity with distributed file systems and advanced storage configurations. Understanding of GPU virtualization and multi-tenant HPC architecture. Exposure to machine learning frameworks More ❯
Woking, England, United Kingdom Hybrid / WFH Options
JR United Kingdom
high-priority incidents and escalations. About You Skills & Experience Proven experience supporting HPC and/or AI workloads in production environments. Strong expertise with Slurmworkloadmanager, including tuning and troubleshooting. Proficiency with system-level debugging, including kernel modules and network interfaces. Experience with GPU compute platforms … paced, ambiguous, high-growth environments. Nice to have Experience with OpenStack and troubleshooting infrastructure in cloud environments. Kubernetes expertise, particularly in HPC or AI workload contexts. Familiarity with distributed file systems and advanced storage configurations. Understanding of GPU virtualization and multi-tenant HPC architecture. Exposure to machine learning frameworks More ❯
Bournemouth, England, United Kingdom Hybrid / WFH Options
JR United Kingdom
high-priority incidents and escalations. About You Skills & Experience Proven experience supporting HPC and/or AI workloads in production environments. Strong expertise with Slurmworkloadmanager, including tuning and troubleshooting. Proficiency with system-level debugging, including kernel modules and network interfaces. Experience with GPU compute platforms … paced, ambiguous, high-growth environments. Nice to have Experience with OpenStack and troubleshooting infrastructure in cloud environments. Kubernetes expertise, particularly in HPC or AI workload contexts. Familiarity with distributed file systems and advanced storage configurations. Understanding of GPU virtualization and multi-tenant HPC architecture. Exposure to machine learning frameworks More ❯
High Wycombe, England, United Kingdom Hybrid / WFH Options
JR United Kingdom
high-priority incidents and escalations. About You Skills & Experience Proven experience supporting HPC and/or AI workloads in production environments. Strong expertise with Slurmworkloadmanager, including tuning and troubleshooting. Proficiency with system-level debugging, including kernel modules and network interfaces. Experience with GPU compute platforms … paced, ambiguous, high-growth environments. Nice to have Experience with OpenStack and troubleshooting infrastructure in cloud environments. Kubernetes expertise, particularly in HPC or AI workload contexts. Familiarity with distributed file systems and advanced storage configurations. Understanding of GPU virtualization and multi-tenant HPC architecture. Exposure to machine learning frameworks More ❯
London, England, United Kingdom Hybrid / WFH Options
JR United Kingdom
high-priority incidents and escalations. About You Skills & Experience Proven experience supporting HPC and/or AI workloads in production environments. Strong expertise with Slurmworkloadmanager, including tuning and troubleshooting. Proficiency with system-level debugging, including kernel modules and network interfaces. Experience with GPU compute platforms … paced, ambiguous, high-growth environments. Nice to have Experience with OpenStack and troubleshooting infrastructure in cloud environments. Kubernetes expertise, particularly in HPC or AI workload contexts. Familiarity with distributed file systems and advanced storage configurations. Understanding of GPU virtualization and multi-tenant HPC architecture. Exposure to machine learning frameworks More ❯
Hemel Hempstead, England, United Kingdom Hybrid / WFH Options
JR United Kingdom
high-priority incidents and escalations. About You Skills & Experience Proven experience supporting HPC and/or AI workloads in production environments. Strong expertise with Slurmworkloadmanager, including tuning and troubleshooting. Proficiency with system-level debugging, including kernel modules and network interfaces. Experience with GPU compute platforms … paced, ambiguous, high-growth environments. Nice to have Experience with OpenStack and troubleshooting infrastructure in cloud environments. Kubernetes expertise, particularly in HPC or AI workload contexts. Familiarity with distributed file systems and advanced storage configurations. Understanding of GPU virtualization and multi-tenant HPC architecture. Exposure to machine learning frameworks More ❯
Portsmouth, England, United Kingdom Hybrid / WFH Options
JR United Kingdom
high-priority incidents and escalations. About You Skills & Experience Proven experience supporting HPC and/or AI workloads in production environments. Strong expertise with Slurmworkloadmanager, including tuning and troubleshooting. Proficiency with system-level debugging, including kernel modules and network interfaces. Experience with GPU compute platforms … paced, ambiguous, high-growth environments. Nice to have Experience with OpenStack and troubleshooting infrastructure in cloud environments. Kubernetes expertise, particularly in HPC or AI workload contexts. Familiarity with distributed file systems and advanced storage configurations. Understanding of GPU virtualization and multi-tenant HPC architecture. Exposure to machine learning frameworks More ❯
Watford, England, United Kingdom Hybrid / WFH Options
JR United Kingdom
high-priority incidents and escalations. About You Skills & Experience Proven experience supporting HPC and/or AI workloads in production environments. Strong expertise with Slurmworkloadmanager, including tuning and troubleshooting. Proficiency with system-level debugging, including kernel modules and network interfaces. Experience with GPU compute platforms … paced, ambiguous, high-growth environments. Nice to have Experience with OpenStack and troubleshooting infrastructure in cloud environments. Kubernetes expertise, particularly in HPC or AI workload contexts. Familiarity with distributed file systems and advanced storage configurations. Understanding of GPU virtualization and multi-tenant HPC architecture. Exposure to machine learning frameworks More ❯
Crawley, England, United Kingdom Hybrid / WFH Options
JR United Kingdom
high-priority incidents and escalations. About You Skills & Experience Proven experience supporting HPC and/or AI workloads in production environments. Strong expertise with Slurmworkloadmanager, including tuning and troubleshooting. Proficiency with system-level debugging, including kernel modules and network interfaces. Experience with GPU compute platforms … paced, ambiguous, high-growth environments. Nice to have Experience with OpenStack and troubleshooting infrastructure in cloud environments. Kubernetes expertise, particularly in HPC or AI workload contexts. Familiarity with distributed file systems and advanced storage configurations. Understanding of GPU virtualization and multi-tenant HPC architecture. Exposure to machine learning frameworks More ❯
Hounslow, England, United Kingdom Hybrid / WFH Options
JR United Kingdom
high-priority incidents and escalations. About You Skills & Experience Proven experience supporting HPC and/or AI workloads in production environments. Strong expertise with Slurmworkloadmanager, including tuning and troubleshooting. Proficiency with system-level debugging, including kernel modules and network interfaces. Experience with GPU compute platforms … paced, ambiguous, high-growth environments. Nice to have Experience with OpenStack and troubleshooting infrastructure in cloud environments. Kubernetes expertise, particularly in HPC or AI workload contexts. Familiarity with distributed file systems and advanced storage configurations. Understanding of GPU virtualization and multi-tenant HPC architecture. Exposure to machine learning frameworks More ❯
Southampton, England, United Kingdom Hybrid / WFH Options
JR United Kingdom
high-priority incidents and escalations. About You Skills & Experience Proven experience supporting HPC and/or AI workloads in production environments. Strong expertise with Slurmworkloadmanager, including tuning and troubleshooting. Proficiency with system-level debugging, including kernel modules and network interfaces. Experience with GPU compute platforms … paced, ambiguous, high-growth environments. Nice to have Experience with OpenStack and troubleshooting infrastructure in cloud environments. Kubernetes expertise, particularly in HPC or AI workload contexts. Familiarity with distributed file systems and advanced storage configurations. Understanding of GPU virtualization and multi-tenant HPC architecture. Exposure to machine learning frameworks More ❯
automation (ISO 27001, SOC 2). Collaborate with ML/AI Teams Package and deploy large‐‐model (LLM) training jobs on distributed GPU clusters (Slurm, Ray, Kubeflow, or AWS SageMaker). Optimize model‐serving (Triton, vLLM, TorchServe) for low‐latency, high‐throughput inference. Cost & Performance Optimization Track cloud spend More ❯
London, England, United Kingdom Hybrid / WFH Options
PhysicsX Ltd
or scientific computing; software engineering concepts and best practices (e.g., versioning, testing, CI/CD, API design, MLOps); container-ization and orchestration (Docker, Kubernetes, Slurm); writing pipelines and experiment environments, including running experiments in pipelines in a systematic way. What we offer Be part of something larger: Make an More ❯
Lightning. Familiarity with packages and technologies such as NumPy, Pandas, Scikit-learn, Scikit-image, OpenCV, Git, and Bash. Experience working with HPC clusters (e.g. SLURM) or with cloud technologies such as AWS, Azure, or GCP. Experience working with federated learning frameworks such as Flower, NVFlare and OpenFL. Evidence of More ❯
scale (compute, network, storage) Deep understanding of accelerated compute environments (GPU, ASIC, FPGA etc) Understanding of AI and HPC orchestration and management layer (Kubernetes, slurm, vertex, sagemaker) Deep understanding of the convergence of HPC and AI infrastructure requirements and challenges Understanding of classic unstructured data applications and next-generation More ❯
recovery management leveraging cloud specific capabilities. Cloud Storage concepts (Block storage/Blob storage). job scheduling tools such as Airflow, Prefect Scheduler and Slurm (or other HPC scheduler). designing and maintaining CICD pipelines to ensure fast delivery and integration of the platform services. Contact If this sounds More ❯
back-end components to ensure best practices are followed across the development process. Manage high-performance computing (HPC) setups, such as AWS ParallelCluster or Slurm, to support large-scale data processing tasks. Promote the use of serverless principles and microservice patterns within the development team. Required Qualifications: Experience with More ❯
back-end components to ensure best practices are followed across the development process. Manage high-performance computing (HPC) setups, such as AWS ParallelCluster or Slurm, to support large-scale data processing tasks. Promote the use of serverless principles and microservice patterns within the development team. Required Qualifications Experience with More ❯
Gitlab, Artifactory, or Docker. Experience with infrastructure automation and configuration management, such as Ansible and Terraform. Experience with HPC and orchestration technologies, such as Slurm or Kubernetes. Experience with Databases and Observability systems, such as Elasticsearch, Datadog, Prometheus, PostgreSQL. #J-18808-Ljbffr More ❯
DevOps, SRE, or platform engineering roles. Experience with software development (Python, Git) Experience with system administration (Bash, Linux, Containerization) Deep knowledge of HPC (e.g. Slurm) or orchestration technologies (e.g. Kubernetes) Excellent written and verbal communication skills. Ability to work well in a fast-paced environment. Nice to have: Experience More ❯