GPU Systems Engineer | Algorithmic Trading Strategy Leader

[Up to c. $425k Comp Package (or equivalent) | Hybrid Working]

We’re hiring on behalf of a top-tier technology-driven trading firm known for its world-class infrastructure and scientific approach to real-time systems. As part of a specialist engineering team, you’ll help scale and optimise massive distributed GPU environments powering AI, research, and quantitative strategies. This is a rare chance to take ownership of petabyte-scale infrastructure across global data centres - shaping the future of how data-intensive workloads are run and accelerated at scale...

Key Responsibilities

  • Design, deploy, and tune large-scale GPU-based compute environments used for AI and quant research workloads
  • Benchmark, analyse, and eliminate performance bottlenecks across compute, storage, and network layers
  • Automate system configuration, monitoring, and diagnostics across thousands of high-density nodes
  • Partner with researchers and engineers to align infrastructure improvements with evolving model and data demands
  • Manage end-to-end rollout of new hardware and software solutions, including hands-on testing and vendor coordination
  • Troubleshoot complex distributed systems across the full stack: hardware, OS, drivers, and container orchestration
  • Own critical projects that enhance performance, reliability, and observability at the fleet level

What You Bring...

  • 4-8 years' experience managing large-scale Linux infrastructure in high-performance, distributed, or AI-centric environments
  • Deep technical fluency with GPU architecture, deployment, and tuning (e.g. memory management, driver compatibility, hardware diagnostics)
  • Strong scripting and automation skills, especially in Python, with infrastructure-as-code mindset
  • Hands-on experience resolving GPU workload issues across compute clusters and supporting technologies
  • Familiarity with performance tooling and debugging in live production environments
  • Practical experience with CUDA or systems-level programming in C/C++
  • Experience with config management frameworks like Salt, Ansible, or Puppet
  • (Preferred) Experience with GPU communication and interconnect technologies (e.g. collective communication libraries such as NCCL, low-latency solutions like GPUDirect RDMA, or high-speed GPU interconnects including NVLink)

...

Company
Techfellow Limited
Location
City of London, Greater London, UK
Hybrid / WFH Options
Posted
Company
Techfellow Limited
Location
City of London, Greater London, UK
Hybrid / WFH Options
Posted