1 of 1 Remote/Hybrid Site Engineer Jobs in Portsmouth

Site Reliability Engineer - Data Centers

Hiring Organisation
TGS International Group
Location
Portsmouth, England, United Kingdom
Site Reliability Engineer (SRE) – GPU Infrastructure Data Centres Fully Remote Role - Work from home The Site Reliability Engineer (SRE) is responsible for the end-to-end validation, testing, and readiness of GPU compute clusters prior to production release. The role ensures that all hardware, networking … Improve test reliability, coverage, and execution efficiency Remediation & System Integrity Diagnose and remediate unhealthy nodes through configuration changes or software fixes Coordinate with on-site support teams for hardware replacements when required Ensure all issues are resolved and documented prior to handover to production operations Documentation & Handover Produce clear ...