software engineering skills. Proficiency in Python and related ML frameworks such as JAX, Pytorch and XLA/MLIR. Experience with distributed training infrastructures (Kubernetes, Slurm) and associated frameworks (Ray). Experience using large-scale distributed training strategies. Hands on experience on training large model at scale. Hands on experience More ❯
a pivotal role in managing and optimising a large-scale infrastructure. Your expertise in Linux systems, along with experience in High-Performance Computing (HPC), Slurmworkload management, and advanced storage solutions, will be essential to ensuring smooth and efficient operations. You'll be working alongside some of the More ❯
and resource fencing Linux tuning with experience around high throughput or high performance computing Bash, Shell or Python Salt, Chef or Ansible HPC Architecture Slurm or Grid engine orMOAB or PBS Containers and container orchestration You will be joining a progressive and exciting company committed to excellence. They offer More ❯
and resource fencing Linux tuning with experience around high throughput or high performance computing Bash, Shell or Python Salt, Chef or Ansible HPC Architecture Slurm or Grid engine orMOAB or PBS Containers and container orchestration You will be joining a progressive and exciting company committed to excellence. They offer More ❯
and resource fencing Linux tuning with experience around high throughput or high performance computing Bash, Shell or Python Salt, Chef or Ansible HPC Architecture Slurm or Grid engine orMOAB or PBS Containers and container orchestration You will be joining a progressive and exciting company committed to excellence. They offer More ❯
access and storage, ensuring efficient I/O capabilities for data science workflows Utilize orchestration frameworks (e.g., Nextflow, Snakemake) and high-performance computing (e.g., SLURM, AWS Batch) Write efficient and optimized SQL queries for data manipulation and analysis Build and maintain GUIs, dashboards, and website front-ends for data More ❯
data, providing efficient I/O capabilities for data science workflows and products. Utilise orchestration frameworks (e.g., Nextflow, Snakemake) and high-performance computing (e.g., SLURM, AWS Batch). Write efficient and optimized SQL queries for data manipulation and analysis. Build and maintain GUI, dashboards, and website front-ends for More ❯
Cambridge, Cambridgeshire, United Kingdom Hybrid / WFH Options
Arm Limited
environments, particularly in performance-sensitive contexts General experience working in compute or storage-heavy environments Exposure to basic job scheduling systems (e.g., LSF, Jenkins, SLURM) Familiarity with monitoring tools like Prometheus, Grafana, or Linux-based telemetry Familiarity with profiling tools Ability to troubleshoot issues related to CPU, memory, I More ❯
Linux specifically around high throughput or high performance computing · Proficiency in Programming Languages for Automation and Tooling. · Experience with HPC cluster schedulers, such as Slurm, Grid engine, MOAB, PBS, etc · Deep working knowledge of containers and container orchestration · Experience contributing to and collaborating on a shared code base · Experience More ❯
data Build and manage databases and data warehouses for biological data access and storage Utilise orchestration tools (e.g., Nextflow, Snakemake) and HPC frameworks (e.g., SLURM, AWS Batch) Write efficient SQL queries for data manipulation and analysis Build and maintain GUIs, dashboards, and web front-ends for data exploration and More ❯
O capabilities to connect data and use efficiently in data science workflows and products. Utilise orchestration (e.g., Nextflow, Snakemake) and high-performance computing (e.g., SLURM, AWS Batch) frameworks. Write efficient and optimised SQL queries for data manipulation and analysis. Build and maintain GUI, dashboards, and website front-ends to More ❯
O capabilities to connect data and use efficiently in data science workflows and products. Utilise orchestration (e.g., Nextflow, Snakemake) and high-performance computing (e.g., SLURM, AWS Batch) frameworks. Write efficient and optimized SQL queries for data manipulation and analysis. Build and maintain GUI, dashboards, and website front-ends to More ❯
O capabilities to connect data and use efficiently in data science workflows and products. Utilise orchestration (e.g., Nextflow, Snakemake) and high-performance computing (e.g., SLURM, AWS Batch) frameworks. Write efficient and optimized SQL queries for data manipulation and analysis. Build and maintain GUI, dashboards, and website front-ends to More ❯
O capabilities to connect data and use efficiently in data science workflows and products. Utilise orchestration (e.g., Nextflow, Snakemake) and high-performance computing (e.g., SLURM, AWS Batch) frameworks. Write efficient and optimized SQL queries for data manipulation and analysis. Build and maintain GUI, dashboards, and website front-ends to More ❯
O capabilities to connect data and use efficiently in data science workflows and products. Utilise orchestration (e.g., Nextflow, Snakemake) and high-performance computing (e.g., SLURM, AWS Batch) frameworks. Write efficient and optimised SQL queries for data manipulation and analysis. Build and maintain GUI, dashboards, and website front-ends to More ❯
O capabilities to connect data and use efficiently in data science workflows and products. Utilise orchestration (e.g. Nextflow, Snakemake) and high-performance computing (e.g., SLURM, AWS Batch) frameworks. Write efficient and optimised SQL queries for data manipulation and analysis Build and maintain GUI, dashboards and website front-ends to More ❯
O capabilities to connect data and use efficiently in data science workflows and products. Utilise orchestration (e.g. Nextflow, Snakemake) and high-performance computing (e.g., SLURM, AWS Batch) frameworks. Write efficient and optimised SQL queries for data manipulation and analysis Build and maintain GUI, dashboards and website front-ends to More ❯
O capabilities to connect data and use efficiently in data science workflows and products. Utilise orchestration (e.g. Nextflow, Snakemake) and high-performance computing (e.g., SLURM, AWS Batch) frameworks. Write efficient and optimised SQL queries for data manipulation and analysis Build and maintain GUI, dashboards and website front-ends to More ❯
Cambridge, England, United Kingdom Hybrid / WFH Options
Arm Limited
large-scale cloud-native compute environments Experience tuning network-attached storage (NAS) and managing storage performance Hands-on knowledge of job schedulers like LSF, SLURM, or cloud-native batch systems Familiarity with tools for tracing, profiling, and monitoring production clusters Comfortable tuning kernel/system parameters and working with More ❯
performance of models on accelerated computing (GPU, TPU, AI ASICs) clusters with high-speed networking. - Experience scaling model training and inference using technologies like Slurm, ParallelCluster, Amazon SageMaker. - Experience in developing and deploying large scale machine learning or deep learning models and/or systems into production, including batch More ❯
Manchester, England, United Kingdom Hybrid / WFH Options
Canonical
environments is a strong advantage. Familiarity with HPC hardware and software is also a strong advantage - delivering great experiences with Infiniband, RDMA, CUDA, MPI, Slurm, Lustre, Singularity and related technologies will be central to this team's work. It will also be advantageous to have experience with Docker image … a year for internal events Additional Skills That You Might Also Bring Experience operating HPC clusters in production Experiences with Infiniband, RDMA, CUDA, MPI, Slurm, Lustre, and/or Singularity What we offer you We consider geographical location, experience, and performance in shaping compensation worldwide. We revisit compensation annually More ❯
bachelor’s degree or master’s degree in Computer Science or related field. 5+ years of experience administering HPC clusters and systems. Experience with SLURM and Grid Engine scheduling software. 5+ years of professional experience in Solution Architecture or Cloud Infrastructure Deployment and support. 7+ years professional experience developing … Public Cloud skills being a plus) and sciences skillsets. Experience with Python, R, or other related data science programming. Experience with POSIT products (Package Manager, Connect, Workbench) either in an end-user or administrator capacity. Experience working with databases and/or supporting. Experience managing large amounts of data … effectively. Experience working with AI/ML technologies. Experience with containerizing compute workload via Docker or Singularity. Experience with Nvidia DGX systems. Additional information Great talent should benefit from a great work environment. If you join our team, you’ll have access to: A competitive salary and bonus package More ❯
Engineering, or related field. - 10+ years of experience in cloud infrastructure design, DevOps, or system architecture. - Proven expertise in GCP infrastructure, GKE, and HPC workload architecture. - Experience in optimizing HPC environments including batch scheduling, job queuing (e.g., Slurm), and shared/distributed storage. - Strong understanding of Kubernetes internals … pod scheduling, autoscaling, and node management - Proficient in Infrastructure as Code (Terraform, Deployment Manager). - Hands-on experience with Docker, Helm, Istio, and container security scanning tools (e.g., Trivy, Aqua). - Experience integrating observability and monitoring tools for GKE. - Strong proficiency in Terraform, Linux administration, and container orchestration tools. … Fluent in English (written and verbal). - Certification: Google Professional Cloud Architect (mandatory). Preferred Qualifications - Hands-on experience with GPU/TPU workloads, Slurm, or Intel MPI/OpenMPI in cloud HPC environments. - Experience deploying hybrid and multi-cloud solutions using Anthos or GCVE. - Familiarity with CI/ More ❯
or product development, or equivalent experience. Experience managing cross-functional projects with strong communication skills, including technical documentation and reporting. Expertise with Linux and workload managers like SLURM or PBS. Deep knowledge of HPC hardware and software, including Linux, compute, schedulers, storage, interconnects, and HPC/AI applications. More ❯
or product development, or equivalent experience. Experience managing cross-functional projects with strong communication skills, including technical documentation and reporting. Expertise with Linux and workload managers like SLURM or PBS. Deep knowledge of HPC hardware and software, including Linux, compute, schedulers, storage, interconnects, and HPC/AI applications. More ❯