Site Reliability Engineer - Data Centers
- Hiring Organisation
- TGS International Group
- Location
- London Area, United Kingdom
Site Reliability Engineer (SRE) – GPU Infrastructure Data Centres Fully Remote Role - Work from home The Site Reliability Engineer (SRE) is responsible for the end-to-end validation, testing, and readiness of GPU compute clusters prior to production release. The role ensures that all hardware, networking, and system … Preferred/Desirable Experience working with GPU-based or high-performance compute environments Familiarity with workload schedulers (e.g. Slurm or similar tools) Understanding of data centre hardware lifecycle and server validation processes Exposure to high-speed networking technologies Experience working with distributed or remote infrastructure teams Performance & Success ...