Site Reliability & Infrastructure Engineer | Digital Asset Trading Innovator
[Up to c. £225k Comp Package | Hybrid Working - 3 Days in Office]
Role Overview
We’re representing a global trading and digital assets firm at the forefront of high-performance technology and infrastructure innovation. The business is seeking a Site Reliability & Infrastructure Engineer to help design, automate, and scale the systems that underpin its global trading platforms. This role sits within a high-performing 11-person infrastructure team that combines Site Reliability and Core Infrastructure responsibilities - owning everything from AWS cloud systems to on-prem deployments. The team is expanding to meet new strategic demands, including increased automation, enhanced observability, and the rollout of new colocation environments to support lower-latency trading. It’s a technically hands-on position that blends architecture, build, and operational ownership, suited to an engineer with curiosity, precision, and a drive to constantly improve how infrastructure is built and run...
Key Responsibilities
- Design, build, and maintain highly available infrastructure across both cloud (AWS) and on-prem environments
- Implement automation across the stack using Infrastructure-as-Code principles (Terraform, Ansible, or similar)
- Administer and optimise Kubernetes clusters across multiple regions, improving resilience, performance, and visibility
- Develop tools and scripts in Python or Go to automate monitoring, configuration, and incident response workflows
- Contribute to on-prem colocation expansion projects, introducing low-latency engineering practices into the infrastructure
- Optimise Linux systems for performance and reliability, including kernel tuning and networking configuration
- Partner with development and platform teams to embed SRE best practices, reducing manual toil through automation and observability
- Drive improvements in monitoring, alerting, and log collection pipelines to enhance system insight and uptime
- Participate in architecture and design reviews, guiding platform evolution with reliability and scale in mind
- Collaborate across disciplines to ensure seamless integration between infrastructure, applications, and security teams
What You’ll Bring...
- 4+ years’ experience in Site Reliability, Infrastructure, or Platform Engineering within production environments
- Solid experience working with AWS and hybrid infrastructure
- Proven ability to manage Kubernetes clusters at scale (on-prem or EKS), including configuration and performance tuning
- Proficiency in Python, Go, or another programming language, with a willingness to code daily
- Strong Linux engineering skills - comfortable with system internals, troubleshooting, and performance optimisation
- Knowledge of network fundamentals (TCP/IP, routing, DNS, firewalls) and how they apply in high-performance environments
- Familiarity with automation tooling such as Terraform or Ansible
- Experience building or maintaining CI/CD pipelines and GitOps workflows
- A proactive, analytical mindset - eager to explore, ask the right questions, and challenge the status quo
- (Preferred) Exposure to low-latency systems, colocation deployments, or real-time trading platforms
...
- Company
- Techfellow Limited
- Location
- City of London, Greater London, UK
Hybrid / WFH Options - Posted
- Company
- Techfellow Limited
- Location
- City of London, Greater London, UK
Hybrid / WFH Options - Posted