partners. This is a DV-Cleared preferred position , multi-year contract with guaranteed extensions and direct impact on national security outcomes. Key Responsibilities Architect and harden secure HPC clusters (Slurm, PBS Pro, GPU-accelerated environments) for classified workloads. Perform cyber risk assessments and compliance mapping (NCSC CAF, JSP 440, NIST 800-53, DEF STAN). Implement zero-trust security More ❯
role Remote £550 Inside ir35 6 Months contract Key Skills needed - Design/implementing Unix/Linux system and services open-source solutions and performance tuning. - HPC technologies: Lustre, Slurm - Configuration systems such as Ansible and Terraform - Unix/Linux scripting. - Networking: TCP/IP, DHCP, VLANs, spanning tree protocol, link aggregation for performance (MTU settings) and reliability requirements. More ❯
role Remote£550 Inside ir35 6 Months contract Key Skills needed - Design/implementing Unix/Linux system and services open-source solutions and performance tuning.- HPC technologies: Lustre, Slurm- Configuration systems such as Ansible and Terraform- Unix/Linux scripting.- Networking: TCP/IP, DHCP, VLANs, spanning tree protocol, link aggregation for performance (MTU settings) and reliability requirements. More ❯
Strong experience in ML platform/ML infra/MLOps roles, ideally at an AI or high-performance compute company. Deep familiarity with GPU orchestration (K8s + NVIDIA stack, Slurm, Ray, etc.). Comfort standing up cloud infrastructure from scratch (AWS preferred). Experience building data pipelines (Airflow/Dagster, Spark/Beam, Kafka, Parquet/S3). A More ❯
Strong experience in ML platform/ML infra/MLOps roles, ideally at an AI or high-performance compute company. Deep familiarity with GPU orchestration (K8s + NVIDIA stack, Slurm, Ray, etc.). Comfort standing up cloud infrastructure from scratch (AWS preferred). Experience building data pipelines (Airflow/Dagster, Spark/Beam, Kafka, Parquet/S3). A More ❯
oxford district, south east england, united kingdom
Ellison Institute of Technology
Computing Facility, the HPC Engineer will design, deploy, and optimise systems that enable large-scale data processing, AI-driven analytics, and simulation workloads across. For example deploying Kubernetes and Slurm to enable real-time data analysis from instruments, MLOps, or scientific workflow managers. We will be hiring either at the regular or senior level, depending on the applicant's … computational research workloads. Evaluate and integrate advanced technologies including GPU/TPU acceleration, high-speed interconnects, and parallel file systems. Manage HPC environments, including Linux-based clusters, schedulers (e.g., Slurm), and high-performance storage systems (e.g., Lustre, BeeGFS, GPFS). Implement robust monitoring, fault-tolerance, and capacity management for high availability and reliability. Develop automation scripts and tools (Python … or cloud computing) in scientific or research settings. Proficiency in Linux system administration, networking, and parallel computing (MPI, OpenMP, CUDA, or ROCm). Experience with using HPC job schedulers (Slurm preferred) and parallel file systems (Lustre, BeeGFS, GPFS). At the senior level: Extensive experience designing, deploying, and managing HPC clusters (or cloud computing) in scientific or research settings. More ❯
startup environment Your application will be all the more interesting if you also have: Experience in an AI/ML environment Experience of high performance computing (HPC) systems and workload managers (Slurm) Worked with modern AI oriented solutions (Fluidstack, Coreweave, Vast ) Location & Remote This role is primarily based at one of our European offices (Paris, France and London More ❯
pipelines Strong proficiency in Python + Bash ; familiarity with C++ Understanding of networking fundamentals Experience with workflow tools such as Airflow, Luigi, or Dagster Exposure to distributed computing tools (Slurm, Celery, HTCondor, etc.) Bonus Skills: Experience with binary market data protocols (ITCH, MDP3, etc.) Understanding of high-performance filesystems + columnar storage formats More ❯
pipelines Strong proficiency in Python + Bash ; familiarity with C++ Understanding of networking fundamentals Experience with workflow tools such as Airflow, Luigi, or Dagster Exposure to distributed computing tools (Slurm, Celery, HTCondor, etc.) Bonus Skills: Experience with binary market data protocols (ITCH, MDP3, etc.) Understanding of high-performance filesystems + columnar storage formats More ❯
pipelines Strong proficiency in Python + Bash ; familiarity with C++ Understanding of networking fundamentals Experience with workflow tools such as Airflow, Luigi, or Dagster Exposure to distributed computing tools (Slurm, Celery, HTCondor, etc.) Bonus Skills: Experience with binary market data protocols (ITCH, MDP3, etc.) Understanding of high-performance filesystems + columnar storage formats More ❯
pipelines Strong proficiency in Python + Bash ; familiarity with C++ Understanding of networking fundamentals Experience with workflow tools such as Airflow, Luigi, or Dagster Exposure to distributed computing tools (Slurm, Celery, HTCondor, etc.) Bonus Skills: Experience with binary market data protocols (ITCH, MDP3, etc.) Understanding of high-performance filesystems + columnar storage formats More ❯
Proficient in Python and Bash; familiarity with C++. Knowledge of computer networking fundamentals. Experience with ETL orchestration frameworks (e.g., Airflow, Luigi, Dagster). Experience with distributed computing environments (e.g., Slurm, Celery, HTCondor). Preferred: Experience with binary market data specifications (e.g., ITCH, MDP3). Understanding of high-performance filesystems and columnar storage formats. This is a high-impact role More ❯
Proficient in Python and Bash; familiarity with C++. Knowledge of computer networking fundamentals. Experience with ETL orchestration frameworks (e.g., Airflow, Luigi, Dagster). Experience with distributed computing environments (e.g., Slurm, Celery, HTCondor). Preferred: Experience with binary market data specifications (e.g., ITCH, MDP3). Understanding of high-performance filesystems and columnar storage formats. This is a high-impact role More ❯
Proficient in Python and Bash; familiarity with C++. Knowledge of computer networking fundamentals. Experience with ETL orchestration frameworks (e.g., Airflow, Luigi, Dagster). Experience with distributed computing environments (e.g., Slurm, Celery, HTCondor). Preferred: Experience with binary market data specifications (e.g., ITCH, MDP3). Understanding of high-performance filesystems and columnar storage formats. This is a high-impact role More ❯
Proficient in Python and Bash; familiarity with C++. Knowledge of computer networking fundamentals. Experience with ETL orchestration frameworks (e.g., Airflow, Luigi, Dagster). Experience with distributed computing environments (e.g., Slurm, Celery, HTCondor). Preferred: Experience with binary market data specifications (e.g., ITCH, MDP3). Understanding of high-performance filesystems and columnar storage formats. This is a high-impact role More ❯
City of London, London, United Kingdom Hybrid/Remote Options
Harnham
the following skills and experience: A track record of building developer-focused platforms or infrastructure products, ideally with some leadership experience Hands-on engineering background, preferably as an Engineering Manager or technical lead in cloud, DevOps, or compute environments Deep technical expertise in cloud platforms (GCP, AWS, or Azure), containerization, CI/CD, and infrastructure-as-code Docker; Kubernetes … EKS, GKE, AKS); Jenkins, GitLab CI, or GitHub Actions; Terraform or CloudFormation; Prometheus, Grafana, Datadog, or New Relic; Slurm, Torque, LSF; MPI; Hadoop or Spark;Director of In Experience with high-performance computing, distributed systems, and observability tools Strong communication and executive presence, with the ability to translate complex technical concepts for diverse audiences Familiarity with AI/ML More ❯
the following skills and experience: A track record of building developer-focused platforms or infrastructure products, ideally with some leadership experience Hands-on engineering background, preferably as an Engineering Manager or technical lead in cloud, DevOps, or compute environments Deep technical expertise in cloud platforms (GCP, AWS, or Azure), containerization, CI/CD, and infrastructure-as-code Docker; Kubernetes … EKS, GKE, AKS); Jenkins, GitLab CI, or GitHub Actions; Terraform or CloudFormation; Prometheus, Grafana, Datadog, or New Relic; Slurm, Torque, LSF; MPI; Hadoop or Spark;Director of In Experience with high-performance computing, distributed systems, and observability tools Strong communication and executive presence, with the ability to translate complex technical concepts for diverse audiences Familiarity with AI/ML More ❯
multi-node HPC clusters. • Oversee high-speed networking configurations, including InfiniBand (Mellanox), RDMA, and Ethernet fabric tuning for low-latency HPC workloads. • Configure and fine-tune HPC schedulers (e.g., Slurm, OpenPBS, SGE) for optimal GPU workload distribution. • Implement containerization strategies (Podman, Docker) and orchestration platforms (K3s, Kubernetes) for managing distributed AI/ML workloads. Networking and Infrastructure Support … clusters. • Analyze logs, conduct performance profiling, and debug CUDA, MPI, and RDMA-related issues. • Work closely with AI/ML research teams, cloud engineers, and enterprise clients to optimize workload performance. Collaboration and Process Improvement • Support the ongoing development of internal HPC test environments and customer POCs. • Work cross-functionally with Service Desk, Operations, and Service Delivery Management to More ❯
infrastructures for HPC/AI clusters Lead the implementation of advanced networking solutions, including NVIDIA InfiniBand and Ethernet technologies Deploy and manage orchestration tools such as NVIDIA Base Command Manager for cluster management and monitoring Provide expert consulting on compute and network infrastructure strategy, planning, and execution Collaborate with clients to assess technical requirements and deliver customized solutions Troubleshoot … networking technologies: InfiniBand (Quantum), Ethernet (Spectrum-X), MLNX-OS, NVIDIA Cumulus OS, and Enterprise SONiC Proficient in Linux systems administration and scripting Extensive hands-on experience with Base Command Manager or equivalent orchestration tools Experience in consulting roles with strong communication and documentation abilities Capacity to manage multiple projects independently and deliver results within dynamic environments. Preferred Qualifications Certifications … in networking and Linux (e.g., CCNP, LFCS, NCP-AIN, NCP-AIO) Experience with NVIDIA DGX systems or similar GPU platforms Familiarity with container orchestration technologies (e.g., Kubernetes, Docker, Slurm) Knowledge of data centre operations and cloud integration methods Experience with GENAI frameworks and related tools Selected Skills GenAI and HPC - Execute Technical Implementation GenAI and HPC - Technical Design and More ❯
customers, including the network infrastructure, security, server, storage, end user compute and device management. Role Overview : The UNIX Systems Specialist reports to the Unix Systems Group lead, Infrastructure Systems Manager (UNIX), and is responsible for design, management and support in the Linux System Administration team, manage the day-to-day running of the UKAEA Linux based IT Systems, HPC …/BPSS level minimum). Desirable o Experience of managing Linux systems at scale. o Experience managing IT projects. o Experience setting up and supporting batch queueing systems (i.e. slurm) o Experience setting up and supporting Nvidia GPU systems o Ability to write well documented code in a high-level language or script (Python/Perl) o Experience in More ❯
and engineering. You’ll gain exposure to technologies that power large-scale modeling, including FEA, and data-driven research, and develop your skills across Linux systems, compute clusters, and workload management tools. Responsibilities: Assist in the setup, monitoring, and maintenance of HPC clusters, storage, and interconnects. Support Linux system administration tasks (RHEL, Rocky), with a focus on stability and … uptime. Help configure and troubleshoot workload managers such as Slurm. Work with senior engineers to monitor performance of key applications and identify opportunities for improvement. Contribute to scripting and automation tasks (Bash, Python) to streamline system operations. Support end-users by responding to tickets, preparing documentation, and guiding researchers on best practices. Learn about parallel computing concepts (MPI, OpenMP … or Python). Exposure to bare metal environments (installing, configuring, and troubleshooting physical servers). Interest in high-performance computing, scientific computing, or distributed systems. Eagerness to learn about workload managers (Slurm or similar). Good problem-solving skills, with the ability to troubleshoot technical issues. Strong communication skills and a collaborative mindset. This role is ideal for More ❯
ll help shape the orchestration layer for one of the most advanced AI compute environments in the world. Your work will involve: Designing core platform services for cluster provisioning, workload orchestration, and resource management APIs. Building integrations with schedulers (Kubernetes, Slurm) and container runtimes for reliable, high-performance GPU workloads. Developing automation for deployment, imaging, and multi-tenant More ❯
ll help shape the orchestration layer for one of the most advanced AI compute environments in the world. Your work will involve: Designing core platform services for cluster provisioning, workload orchestration, and resource management APIs. Building integrations with schedulers (Kubernetes, Slurm) and container runtimes for reliable, high-performance GPU workloads. Developing automation for deployment, imaging, and multi-tenant More ❯
a hands-on role at a global systematic trading firm with $25 billion under management, earning significant bonuses. As a Senior Platform Engineer you'll develop and support scalable workload scheduling solutions for HPC environments using tools such as YellowDog within a large scale computing environment with both on-premise and cloud (AWS) based services. You'll collaborate with … with flexibility to work from home 1-2 days a week. About you: You have experience of engineering and supporting at least one HPC scheduler, such as YellowDog, Ray, Slurm or IBM Symphony You have a deep knowledge of Linux You have a good understanding of both loosely coupled and tightly coupled HPC workloads and experience of working on More ❯