Senior Site Reliability Engineer

Apply Now

Senior Linux SRE

Outside IR35 - 12 month contract initially

Full remote role across UK / Europe

Our client is a consumer facing tech business and they are looking for a Senior SRE with a strong background in Linux infrastructure and third-party system operations. You’ll be responsible for running and optimising large-scale production environments (5,000+ hosts) built on technologies such as Kafka, Redis, Kubernetes and MySQL. This is a hands-on, systems-level position focused on reliability, scalability, performance and troubleshooting. You’ll work alongside experienced engineers, operating with a high degree of autonomy to keep critical systems healthy, resilient, and observable.

Key Responsibilities

Manage, configure and maintain Linux systems (CentOS, Rocky, RHEL or similar distributions) in production environments
Install, upgrade and troubleshoot third-party systems including Kafka, Redis, Kubernetes and MySQL
Support day-to-day operations in data centre / large-scale infrastructure environments (5,000+ hosts)
Contribute to system reliability, scalability and performance improvements across the platform
Participate in an on-call rotation (one week every 4–5 weeks) to ensure 24x7 availability of critical systems
Collaborate with internal teams to improve observability, monitoring and alerting across services
Identify and implement operational improvements to existing monitoring, logging and incident response processes
Use scripting and automation (primarily Bash and Python) to reduce toil and streamline recurring tasks
Contribute to Infrastructure-as-Code practices using tools such as Ansible or Puppet

Required Experience & Skills

5+ years’ experience in Linux system administration, SRE, Infrastructure or Platform Engineering roles
Proven experience operating large-scale infrastructure (thousands of hosts / distributed systems)
Strong troubleshooting and performance tuning skills at the infrastructure and OS level
Solid understanding of MySQL operations, including replication concepts
Hands-on experience with Kafka and/or other distributed messaging systems
Experience with Kubernetes or similar container orchestration platforms
Practical scripting skills in Bash and/or Python for automation and tooling
Familiarity with IaC tools such as Ansible or Puppet
Good understanding of monitoring, alerting, logging and observability best practices
Excellent communication skills and the ability to own incidents end-to-end, including post-incident reviews

Company: TechNET IT Recruitment Ltd
Location: United Kingdom, UK
Hybrid/Remote Options
Posted: 2 days ago

Apply Now

Company: TechNET IT Recruitment Ltd
Location: United Kingdom, UK
Hybrid/Remote Options
Posted: 2 days ago