Senior Site Reliability Engineer
Senior Linux SRE
Outside IR35 - 12 month contract initially
Full remote role across UK / Europe
Our client is a consumer facing tech business and they are looking for a Senior SRE with a strong background in Linux infrastructure and third-party system operations. You’ll be responsible for running and optimising large-scale production environments (5,000+ hosts) built on technologies such as Kafka, Redis, Kubernetes and MySQL. This is a hands-on, systems-level position focused on reliability, scalability, performance and troubleshooting. You’ll work alongside experienced engineers, operating with a high degree of autonomy to keep critical systems healthy, resilient, and observable.
Key Responsibilities
- Manage, configure and maintain Linux systems (CentOS, Rocky, RHEL or similar distributions) in production environments
- Install, upgrade and troubleshoot third-party systems including Kafka, Redis, Kubernetes and MySQL
- Support day-to-day operations in data centre / large-scale infrastructure environments (5,000+ hosts)
- Contribute to system reliability, scalability and performance improvements across the platform
- Participate in an on-call rotation (one week every 4–5 weeks) to ensure 24x7 availability of critical systems
- Collaborate with internal teams to improve observability, monitoring and alerting across services
- Identify and implement operational improvements to existing monitoring, logging and incident response processes
- Use scripting and automation (primarily Bash and Python) to reduce toil and streamline recurring tasks
- Contribute to Infrastructure-as-Code practices using tools such as Ansible or Puppet
Required Experience & Skills
- 5+ years’ experience in Linux system administration, SRE, Infrastructure or Platform Engineering roles
- Proven experience operating large-scale infrastructure (thousands of hosts / distributed systems)
- Strong troubleshooting and performance tuning skills at the infrastructure and OS level
- Solid understanding of MySQL operations, including replication concepts
- Hands-on experience with Kafka and/or other distributed messaging systems
- Experience with Kubernetes or similar container orchestration platforms
- Practical scripting skills in Bash and/or Python for automation and tooling
- Familiarity with IaC tools such as Ansible or Puppet
- Good understanding of monitoring, alerting, logging and observability best practices
- Excellent communication skills and the ability to own incidents end-to-end, including post-incident reviews
- Company
- TechNET IT Recruitment Ltd
- Location
- United Kingdom, UK
Hybrid/Remote Options - Posted
- Company
- TechNET IT Recruitment Ltd
- Location
- United Kingdom, UK
Hybrid/Remote Options - Posted