Cambridge, Cambridgeshire, United Kingdom Hybrid / WFH Options
Proactive Appointments
environments, such as HPC, HTC or BC Drive innovative computational solutions and exploit emerging technologies Experience of administration of large-scale cluster and server computing and related Software (eg Slurm, LSF, Grid Engine) Hands-on experience working in a DevOps team and using agile methodologies Operating and consuming virtualised private cloud resources (eg OpenStack) Understanding of Linux system administration More ❯
Platform Engineering Scripting in Python or Bash BSc in Computer Science, Engineering, or related field Experience with Git and Linux environments Bonus: GitLab, Docker, Ansible, Terraform, HPC/Orchestration (Slurm/K8s), Prometheus, Grafana, Kibana, Datadog, Elasticsearch, Logstash Linux Engineering/DevOps Team Work across both Linux infrastructure management and DevOps engineering, supporting large-scale distributed compute environments and More ❯
Platform Engineering Scripting in Python or Bash BSc in Computer Science, Engineering, or related field Experience with Git and Linux environments Bonus: GitLab, Docker, Ansible, Terraform, HPC/Orchestration (Slurm/K8s), Prometheus, Grafana, Kibana, Datadog, Elasticsearch, Logstash Linux Engineering/DevOps Team Work across both Linux infrastructure management and DevOps engineering, supporting large-scale distributed compute environments and More ❯
london (city of london), south east england, united kingdom
Stanford Black Limited
Platform Engineering Scripting in Python or Bash BSc in Computer Science, Engineering, or related field Experience with Git and Linux environments Bonus: GitLab, Docker, Ansible, Terraform, HPC/Orchestration (Slurm/K8s), Prometheus, Grafana, Kibana, Datadog, Elasticsearch, Logstash Linux Engineering/DevOps Team Work across both Linux infrastructure management and DevOps engineering, supporting large-scale distributed compute environments and More ❯
Hampshire, England, United Kingdom Hybrid / WFH Options
Hays Specialist Recruitment Limited
It's a great opportunity for someone who thrives in project-led infrastructure work and wants to help shape cutting-edge HPC solutions. What you'll need to succeed Slurm: Proven experience managing and tuning HPC job schedulers. Infiniband and RoCE: Deep knowledge of high-speed networking technologies. Ansible: Proficiency in using Ansible for automation and configuration management. Networking More ❯
Edinburgh, Midlothian, United Kingdom Hybrid / WFH Options
Lenovo
transformer architectures, and related concepts. Experience with data processing tools and techniques (e.g., Pandas, NumPy). Experience working with Linux systems and/or HPC cluster job scheduling (e.g., Slurm, PBS). Excellent communication, collaboration, and problem solving skills. Bonus Points Ph.D. in Computer Science, Machine Learning, or a related field. Experience with distributed training frameworks (e.g., DeepSpeed, Megatron More ❯
Dorset, England, United Kingdom Hybrid / WFH Options
Hays Specialist Recruitment Limited
infrastructure solutions across compute, storage, and networking Producing detailed technical documentation: hardware specs, data centre layouts, cabling, power and cooling Installing and tuning Linux-based operating systems and configuring SLURM job schedulers Optimising high-speed networking technologies (Infiniband, RoCE) Automating deployments and maintenance using Ansible, Terraform, Bash, and Python Troubleshooting complex distributed systems and mentoring junior engineers This is … building systems that scale, this role is for you. What you'll need to succeed Proven experience designing and scaling large HPC clusters (hundreds to thousands of nodes) Strong SLURM configuration skills - partitions, priorities, resource management Advanced Linux administration and performance tuning Expertise in high-performance networking (Infiniband, RoCE, RDMA) Experience with distributed file systems (Lustre, Ceph, WEKA, VAST More ❯
multi-node HPC clusters. Oversee high-speed networking configurations, including InfiniBand (Mellanox), RDMA, and Ethernet fabric tuning for low-latency HPC workloads. Configure and fine-tune HPC schedulers (e.g., Slurm, OpenPBS, SGE) for optimal GPU workload distribution. Implement containerization strategies (Podman, Docker) and orchestration platforms (K3s, Kubernetes) for managing distributed AI/ML workloads. Networking and Infrastructure Support … clusters. Analyze logs, conduct performance profiling, and debug CUDA, MPI, and RDMA-related issues. Work closely with AI/ML research teams, cloud engineers, and enterprise clients to optimize workload performance. Collaboration and Process Improvement Support the ongoing development of internal HPC test environments and customer POCs. Work cross-functionally with Service Desk, Operations, and Service Delivery Management to More ❯
and engineering. You’ll gain exposure to technologies that power large-scale modeling, including FEA, and data-driven research, and develop your skills across Linux systems, compute clusters, and workload management tools. Responsibilities: Assist in the setup, monitoring, and maintenance of HPC clusters, storage, and interconnects. Support Linux system administration tasks (RHEL, Rocky), with a focus on stability and … uptime. Help configure and troubleshoot workload managers such as Slurm. Work with senior engineers to monitor performance of key applications and identify opportunities for improvement. Contribute to scripting and automation tasks (Bash, Python) to streamline system operations. Support end-users by responding to tickets, preparing documentation, and guiding researchers on best practices. Learn about parallel computing concepts (MPI, OpenMP … or Python). Exposure to bare metal environments (installing, configuring, and troubleshooting physical servers). Interest in high-performance computing, scientific computing, or distributed systems. Eagerness to learn about workload managers (Slurm or similar). Good problem-solving skills, with the ability to troubleshoot technical issues. Strong communication skills and a collaborative mindset. This role is ideal for More ❯
oxford district, south east england, united kingdom
Arcus Search
and engineering. You’ll gain exposure to technologies that power large-scale modeling, including FEA, and data-driven research, and develop your skills across Linux systems, compute clusters, and workload management tools. Responsibilities: Assist in the setup, monitoring, and maintenance of HPC clusters, storage, and interconnects. Support Linux system administration tasks (RHEL, Rocky), with a focus on stability and … uptime. Help configure and troubleshoot workload managers such as Slurm. Work with senior engineers to monitor performance of key applications and identify opportunities for improvement. Contribute to scripting and automation tasks (Bash, Python) to streamline system operations. Support end-users by responding to tickets, preparing documentation, and guiding researchers on best practices. Learn about parallel computing concepts (MPI, OpenMP … or Python). Exposure to bare metal environments (installing, configuring, and troubleshooting physical servers). Interest in high-performance computing, scientific computing, or distributed systems. Eagerness to learn about workload managers (Slurm or similar). Good problem-solving skills, with the ability to troubleshoot technical issues. Strong communication skills and a collaborative mindset. This role is ideal for More ❯
customers, including the network infrastructure, security, server, storage, end user compute and device management. Role Overview : The UNIX Systems Specialist reports to the Unix Systems Group lead, Infrastructure Systems Manager (UNIX), and is responsible for design, management and support in the Linux System Administration team, manage the day-to-day running of the UKAEA Linux based IT Systems, HPC …/BPSS level minimum). Desirable o Experience of managing Linux systems at scale. o Experience managing IT projects. o Experience setting up and supporting batch queueing systems (i.e. slurm) o Experience setting up and supporting Nvidia GPU systems o Ability to write well documented code in a high-level language or script (Python/Perl) o Experience in More ❯
a related field2. Proven industry experience in building, deploying, and maintaining Linux servers (Red Hat/Rocky Linux)3. A working knowledge and practical experience with batch queuing systems (Slurm) and cloud computing, particularly AWSKey Words: Linux Systems Administrator/Scientific Computing/Red Hat/Rocky Linux/Slurm/AWS/Oracle DBA/IT Security More ❯
research demands and IT infrastructure. Leverage any scientific computing experience to optimize system performance and manage specialized applications. Assist with management of high-performance compute resources, including experience with Slurm, clustering, and related HPC technologies. Work closely with other technical teams and stakeholders to align IT services with organizational needs. Build and maintain strong stakeholder relationships, communicating complex technical … 9. Proven experience with high-end workstation hardware setups and scientific application support. Demonstrated knowledge of scientific computing and experience in high performance compute environments, including experience with Slurm and clustering, is highly desirable. Strong troubleshooting skills for both hardware and software issues. Desirable Skills: Working knowledge of ServiceNow and its application in incident and service management. Familiarity with More ❯
research demands and IT infrastructure. Leverage any scientific computing experience to optimize system performance and manage specialized applications. Assist with management of high-performance compute resources, including experience with Slurm, clustering, and related HPC technologies. Work closely with other technical teams and stakeholders to align IT services with organizational needs. Build and maintain strong stakeholder relationships, communicating complex technical … 9. Proven experience with high-end workstation hardware setups and scientific application support. Demonstrated knowledge of scientific computing and experience in high performance compute environments, including experience with Slurm and clustering, is highly desirable. Strong troubleshooting skills for both hardware and software issues. Desirable Skills: Working knowledge of ServiceNow and its application in incident and service management. Familiarity with More ❯
watford, hertfordshire, east anglia, united kingdom
Cognizant
research demands and IT infrastructure. Leverage any scientific computing experience to optimize system performance and manage specialized applications. Assist with management of high-performance compute resources, including experience with Slurm, clustering, and related HPC technologies. Work closely with other technical teams and stakeholders to align IT services with organizational needs. Build and maintain strong stakeholder relationships, communicating complex technical … 9. Proven experience with high-end workstation hardware setups and scientific application support. Demonstrated knowledge of scientific computing and experience in high performance compute environments, including experience with Slurm and clustering, is highly desirable. Strong troubleshooting skills for both hardware and software issues. Desirable Skills: Working knowledge of ServiceNow and its application in incident and service management. Familiarity with More ❯