Senior Site Reliability Engineer

Senior Linux SRE

Outside IR35 - 12 month contract initially

Full remote role across UK / Europe

Our client is a consumer facing tech business and they are looking for a Senior SRE with a strong background in Linux infrastructure and third-party system operations. You’ll be responsible for running and optimising large-scale production environments (5,000+ hosts) built on technologies such as Kafka, Redis, Kubernetes and MySQL. This is a hands-on, systems-level position focused on reliability, scalability, performance and troubleshooting. You’ll work alongside experienced engineers, operating with a high degree of autonomy to keep critical systems healthy, resilient, and observable.

Key Responsibilities

  • Manage, configure and maintain Linux systems (CentOS, Rocky, RHEL or similar distributions) in production environments
  • Install, upgrade and troubleshoot third-party systems including Kafka, Redis, Kubernetes and MySQL
  • Support day-to-day operations in data centre / large-scale infrastructure environments (5,000+ hosts)
  • Contribute to system reliability, scalability and performance improvements across the platform
  • Participate in an on-call rotation (one week every 4–5 weeks) to ensure 24x7 availability of critical systems
  • Collaborate with internal teams to improve observability, monitoring and alerting across services
  • Identify and implement operational improvements to existing monitoring, logging and incident response processes
  • Use scripting and automation (primarily Bash and Python) to reduce toil and streamline recurring tasks
  • Contribute to Infrastructure-as-Code practices using tools such as Ansible or Puppet

Required Experience & Skills

  • 5+ years’ experience in Linux system administration, SRE, Infrastructure or Platform Engineering roles
  • Proven experience operating large-scale infrastructure (thousands of hosts / distributed systems)
  • Strong troubleshooting and performance tuning skills at the infrastructure and OS level
  • Solid understanding of MySQL operations, including replication concepts
  • Hands-on experience with Kafka and/or other distributed messaging systems
  • Experience with Kubernetes or similar container orchestration platforms
  • Practical scripting skills in Bash and/or Python for automation and tooling
  • Familiarity with IaC tools such as Ansible or Puppet
  • Good understanding of monitoring, alerting, logging and observability best practices
  • Excellent communication skills and the ability to own incidents end-to-end, including post-incident reviews
Company
TechNET IT Recruitment Ltd
Location
United Kingdom, UK
Hybrid/Remote Options
Posted
Company
TechNET IT Recruitment Ltd
Location
United Kingdom, UK
Hybrid/Remote Options
Posted