Site Reliability & Infrastructure Engineer | Digital Asset Trading Innovator

Apply Now

[Up to c. £225k Comp Package | Hybrid Working - 3 Days in Office]

Role Overview

We’re representing a global trading and digital assets firm at the forefront of high-performance technology and infrastructure innovation. The business is seeking a Site Reliability & Infrastructure Engineer to help design, automate, and scale the systems that underpin its global trading platforms. This role sits within a high-performing 11-person infrastructure team that combines Site Reliability and Core Infrastructure responsibilities - owning everything from AWS cloud systems to on-prem deployments. The team is expanding to meet new strategic demands, including increased automation, enhanced observability, and the rollout of new colocation environments to support lower-latency trading. It’s a technically hands-on position that blends architecture, build, and operational ownership, suited to an engineer with curiosity, precision, and a drive to constantly improve how infrastructure is built and run...

Key Responsibilities

Design, build, and maintain highly available infrastructure across both cloud (AWS) and on-prem environments
Implement automation across the stack using Infrastructure-as-Code principles (Terraform, Ansible, or similar)
Administer and optimise Kubernetes clusters across multiple regions, improving resilience, performance, and visibility
Develop tools and scripts in Python or Go to automate monitoring, configuration, and incident response workflows
Contribute to on-prem colocation expansion projects, introducing low-latency engineering practices into the infrastructure
Optimise Linux systems for performance and reliability, including kernel tuning and networking configuration
Partner with development and platform teams to embed SRE best practices, reducing manual toil through automation and observability
Drive improvements in monitoring, alerting, and log collection pipelines to enhance system insight and uptime
Participate in architecture and design reviews, guiding platform evolution with reliability and scale in mind
Collaborate across disciplines to ensure seamless integration between infrastructure, applications, and security teams

What You’ll Bring...

4+ years’ experience in Site Reliability, Infrastructure, or Platform Engineering within production environments
Solid experience working with AWS and hybrid infrastructure
Proven ability to manage Kubernetes clusters at scale (on-prem or EKS), including configuration and performance tuning
Proficiency in Python, Go, or another programming language, with a willingness to code daily
Strong Linux engineering skills - comfortable with system internals, troubleshooting, and performance optimisation
Knowledge of network fundamentals (TCP/IP, routing, DNS, firewalls) and how they apply in high-performance environments
Familiarity with automation tooling such as Terraform or Ansible
Experience building or maintaining CI/CD pipelines and GitOps workflows
A proactive, analytical mindset - eager to explore, ask the right questions, and challenge the status quo
(Preferred) Exposure to low-latency systems, colocation deployments, or real-time trading platforms

...

Company: Techfellow Limited
Location: City of London, Greater London, UK
Hybrid / WFH Options
Posted: Today

Apply Now

Company: Techfellow Limited
Location: City of London, Greater London, UK
Hybrid / WFH Options
Posted: Today