Linux Site Reliability Engineer
Linux Site Reliability Engineer (SRE) – Contract
📍 Glasgow
💷 £550 per day
📅 3 days per week onsite
📄 Inside IR35
We are looking for an experienced Linux Site Reliability Engineer (SRE) to join a high-performing infrastructure support team focused on maintaining and improving critical platform reliability within a large-scale enterprise environment.
This position will focus on resolving hardware and platform-related incidents escalated from the L3 support team. The successful candidate will have strong Linux systems expertise, hands-on server troubleshooting experience, and a proactive approach to operational improvement, automation, and incident reduction.
Key Responsibilities
- Investigate and resolve Linux infrastructure and hardware-related incidents
- Perform advanced Linux systems administration and troubleshooting
- Support remote server recovery and diagnostics using out-of-band management technologies
- Manage incidents end-to-end, including triage, mitigation, escalation, communication, and resolution
- Create and maintain operational runbooks and technical documentation
- Identify recurring issues and implement improvements to reduce MTTD and MTTR
- Work closely with engineering and operations teams to improve system reliability and resilience
- Participate in post-incident reviews and root cause analysis
Essential Skills & Experience
- Strong Linux administration and troubleshooting experience
- Knowledge of server hardware including disks, RAID/HBA, NICs, and firmware
- Experience with iDRAC, iLO, IPMI, Redfish, or similar remote management tools
- Proven experience supporting production infrastructure environments
- Understanding of SRE principles including SLOs, SLIs, MTTD, and MTTR
- Strong communication and stakeholder management skills
- Excellent documentation and process improvement experience
Desirable Skills
- Scripting and automation experience with Bash or Python
- Familiarity with VMware, KVM, Docker, or Kubernetes
- Experience with monitoring, observability, and alerting platforms