Site Reliability & Infrastructure Engineer | Digital Asset Trading Innovator

[Up to c. £225k Comp Package | Hybrid Working - 3 Days in Office]

Role Overview

We’re representing a global trading and digital assets firm at the forefront of high-performance technology and infrastructure innovation. The business is seeking a Site Reliability & Infrastructure Engineer to help design, automate, and scale the systems that underpin its global trading platforms. This role sits within a high-performing 11-person infrastructure team that combines Site Reliability and Core Infrastructure responsibilities - owning everything from AWS cloud systems to on-prem deployments. The team is expanding to meet new strategic demands, including increased automation, enhanced observability, and the rollout of new colocation environments to support lower-latency trading. It’s a technically hands-on position that blends architecture, build, and operational ownership, suited to an engineer with curiosity, precision, and a drive to constantly improve how infrastructure is built and run...

Key Responsibilities

  • Design, build, and maintain highly available infrastructure across both cloud (AWS) and on-prem environments
  • Implement automation across the stack using Infrastructure-as-Code principles (Terraform, Ansible, or similar)
  • Administer and optimise Kubernetes clusters across multiple regions, improving resilience, performance, and visibility
  • Develop tools and scripts in Python or Go to automate monitoring, configuration, and incident response workflows
  • Contribute to on-prem colocation expansion projects, introducing low-latency engineering practices into the infrastructure
  • Optimise Linux systems for performance and reliability, including kernel tuning and networking configuration
  • Partner with development and platform teams to embed SRE best practices, reducing manual toil through automation and observability
  • Drive improvements in monitoring, alerting, and log collection pipelines to enhance system insight and uptime
  • Participate in architecture and design reviews, guiding platform evolution with reliability and scale in mind
  • Collaborate across disciplines to ensure seamless integration between infrastructure, applications, and security teams

What You’ll Bring...

  • 4+ years’ experience in Site Reliability, Infrastructure, or Platform Engineering within production environments
  • Solid experience working with AWS and hybrid infrastructure
  • Proven ability to manage Kubernetes clusters at scale (on-prem or EKS), including configuration and performance tuning
  • Proficiency in Python, Go, or another programming language, with a willingness to code daily
  • Strong Linux engineering skills - comfortable with system internals, troubleshooting, and performance optimisation
  • Knowledge of network fundamentals (TCP/IP, routing, DNS, firewalls) and how they apply in high-performance environments
  • Familiarity with automation tooling such as Terraform or Ansible
  • Experience building or maintaining CI/CD pipelines and GitOps workflows
  • A proactive, analytical mindset - eager to explore, ask the right questions, and challenge the status quo
  • (Preferred) Exposure to low-latency systems, colocation deployments, or real-time trading platforms

...

Company
Techfellow Limited
Location
London, UK
Hybrid / WFH Options
Posted
Company
Techfellow Limited
Location
London, UK
Hybrid / WFH Options
Posted