Storage systems Linux operating system Networking Parallel file systems Demonstrable understanding of data storage concepts: RAID/Erasure Coding, block storage, parallel file systems, etc. Understanding of networking concepts: Infiniband, Ethernet, IP, TCP, routing, etc. Preferred Technical requirements include: Other parallel filesystems such as SpectrumScale or StorNext Experience developing and debugging shell and Python, BASH or Perl scripts Networking (InfiniBand … and Ethernet Definite Plus) Experience working with NAS protocols: NFS, CIFS, SMB, S3, FTP, sftp Experience working with authentication protocols (ActiveDirectory, LDAP) Extensive experience in debugging and troubleshooting Infiniband and Ethernet networks. Experience working with file transfer (e.g., SFTP, RSYNC, FTP) and NAS protocols (NFS, CIFS,) Experience working with authentication protocols (LDAP, OAuth2/OIDC, Kerberos, SAML) Experience working with More ❯
networking, virtualization, cloud, etc.). Strong technical troubleshooting in multi-platform, distributed environments. Strong understanding of distributed storage systems. Expertise in Linux/Unix administration. Deep understanding of networking (Infiniband, Ethernet, DPDK, UCX), cloud computing, and distributed storage. Proficiency in Python, Bash, and experience with automation scripting for system monitoring and troubleshooting. Knowledge of POSIX, NFS, S3 protocols, log management More ❯
and/or high-speed interconnects Experience in e/Specman and/or SystemVerilog/UVM Background in Scripting (Python/Perl/shell) Knowledge of Ethernet and InfiniBand protocols Are you creative and autonomous? Do you love the challenge of crafting the highest performance & lowest power silicon possible? If so, we want to hear from you. Come, join More ❯
Experience with Linux virtualization, networking or graphics stacks Experience with one or more of the follow Experience with Docker/OCI containers/K8s ing technologies: confidential computing, RDMA, Infiniband and high performance computing. Performance engineering, benchmarking and profiling What We Offer You We consider geographical location, experience, and performance in shaping compensation worldwide. We revisit compensation annually (and more More ❯
PyTorch, or Hugging Face Transformers. Good understanding of programming/scripting: (e.g., Python, Go) for customizing solutions, creating scripts, or automating tasks. Experience with AI relevant infrastructure, including Networking (InfiniBand and RoCE), Storage (FC, IP and scale out) and AI accelerators (GPUs etc). Excellent presentation skills - ability to value-sell and deliver engaging workshops to both technical and non More ❯
Altair Grid Engine. Proficiency in developing workflows for application builds and testing. Experience with setting up CUDA, OpenMPI, TensorFlow, and PyTorch. Familiarity with cloud services and technologies. Knowledge of InfiniBand or other fast interconnect technologies. A plus: Understanding of drug development processes and workflows commonly encountered in biotech/pharma R&D environments. Strong communication skills, both verbal and written. More ❯
Staff Software Engineer, AI Reliability Engineering London, UK About Anthropic Anthropic's mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our More ❯
CUTLASS, CUB, Thrust, cuDNN and cuBLAS Intuition about the latency and throughput characteristics of CUDA graph launch, tensor core arithmetic, warp-level synchronization and asynchronous memory loads Background in Infiniband, RoCE, GPUDirect, PXN, rail optimisation and NVLink, and how to use these networking technologies to link up GPU clusters An understanding of the collective algorithms supporting distributed GPU training in More ❯