GPU Systems Engineer | Algorithmic Trading Strategy Leader
[Up to c. $425k Comp Package (or equivalent) | Hybrid Working]
We’re hiring on behalf of a top-tier technology-driven trading firm known for its world-class infrastructure and scientific approach to real-time systems. As part of a specialist engineering team, you’ll help scale and optimise massive distributed GPU environments powering AI, research, and quantitative strategies. This is a rare chance to take ownership of petabyte-scale infrastructure across global data centres - shaping the future of how data-intensive workloads are run and accelerated at scale...
Key Responsibilities
- Design, deploy, and tune large-scale GPU-based compute environments used for AI and quant research workloads
- Benchmark, analyse, and eliminate performance bottlenecks across compute, storage, and network layers
- Automate system configuration, monitoring, and diagnostics across thousands of high-density nodes
- Partner with researchers and engineers to align infrastructure improvements with evolving model and data demands
- Manage end-to-end rollout of new hardware and software solutions, including hands-on testing and vendor coordination
- Troubleshoot complex distributed systems across the full stack: hardware, OS, drivers, and container orchestration
- Own critical projects that enhance performance, reliability, and observability at the fleet level
What You Bring...
- 4-8 years' experience managing large-scale Linux infrastructure in high-performance, distributed, or AI-centric environments
- Deep technical fluency with GPU architecture, deployment, and tuning (e.g. memory management, driver compatibility, hardware diagnostics)
- Strong scripting and automation skills, especially in Python, with infrastructure-as-code mindset
- Hands-on experience resolving GPU workload issues across compute clusters and supporting technologies
- Familiarity with performance tooling and debugging in live production environments
- Practical experience with CUDA or systems-level programming in C/C++
- Experience with config management frameworks like Salt, Ansible, or Puppet
- (Preferred) Experience with GPU communication and interconnect technologies (e.g. collective communication libraries such as NCCL, low-latency solutions like GPUDirect RDMA, or high-speed GPU interconnects including NVLink)
...
- Company
- Techfellow Limited
- Location
- City of London, Greater London, UK
Hybrid / WFH Options - Posted
- Company
- Techfellow Limited
- Location
- City of London, Greater London, UK
Hybrid / WFH Options - Posted