Site Reliability Engineer (Mid / Senior)

Site Reliability Engineer (Mid / Senior)

South West London (Hybrid – 1–2 days onsite) Salary: Competitive + Benefits

We are looking for a Site Reliability Engineer to join a well-established small infrastructure team supporting a highly available, production environment. This is an exciting opportunity to work across a modern, self-hosted platform spanning Kubernetes, physical infrastructure and automation, with a strong focus on Ubuntu-based systems.

The Role

As an SRE, you will play a key role in ensuring the availability, performance, security and resilience of production systems. Working in a small, collaborative team, you’ll take ownership of day-to-day platform operations, incident response and continuous improvement, while partnering closely with development teams to deliver reliable and scalable services.

Key Responsibilities

  • Administer and maintain Linux (Ubuntu) server environments
  • Manage self-hosted Kubernetes clusters and supporting infrastructure
  • Support on-premise infrastructure including physical servers and virtualisation platforms
  • Administer storage solutions including NFS, iSCSI and object storage
  • Build and maintain automation using Ansible or similar IaC tools
  • Develop operational tooling using Bash and Python
  • Monitor system health using tools such as Prometheus, Grafana, Zabbix or Nagios
  • Investigate and resolve production incidents (on-call rota involved)
  • Implement security hardening and infrastructure best practices
  • Manage backup and disaster recovery processes and regular testing
  • Support and improve CI/CD pipelines and deployment processes
  • Collaborate with engineering teams to improve reliability and performance

Essential Skills

  • Strong Linux systems administration (Ubuntu preferred)
  • Experience running production Kubernetes environments
  • Solid understanding of networking (TCP/IP, DNS, routing, firewalls)
  • Experience with physical servers and virtualisation platforms
  • Hands-on experience with Ansible or other IaC tools
  • Scripting skills in Bash and Python
  • Experience with monitoring and alerting platforms
  • Knowledge of Linux storage technologies (NFS, iSCSI)
  • Experience with backup & disaster recovery
  • Exposure to Active Directory / Entra ID / endpoint management
  • Strong troubleshooting and problem-solving skills

Desirable Experience

  • Object storage, MariaDB or database administration
  • CI/CD tools such as Jenkins
  • AWS (S3, Lambda, CloudFront) exposure
  • Terraform or additional IaC tooling
  • Experience with Harvester or similar platforms
  • Knowledge of security, compliance or GDPR

Why Apply?

  • Work on complex, real-world infrastructure (not just cloud-native)
  • High ownership in a small, collaborative team
  • Exposure to a broad modern tech stack across infra, Kubernetes and automation
  • Hybrid working with a competitive salary package

Job Details

Company
Reed
Location
South West London, London, England, United Kingdom
Hybrid / Remote Options
Employment Type
Full-Time
Salary
Salary negotiable
Posted