1 of 1 Permanent Site Reliability Engineer Jobs in Portsmouth

Site Reliability Engineer - Data Centers

Hiring Organisation
TGS International Group
Location
Portsmouth, England, United Kingdom
Site Reliability Engineer (SRE) – GPU Infrastructure Data Centres Fully Remote Role - Work from home The Site Reliability Engineer (SRE) is responsible for the end-to-end validation, testing, and readiness of GPU compute clusters prior to production release. The role ensures that all hardware … networking, and system components meet operational and reliability standards before customer workloads are deployed. Working closely with global infrastructure and engineering teams, the SRE plays a critical role in maintaining the quality, stability, and integrity of high-performance compute environments. Key Responsibilities Cluster Validation & Testing Validate GPU clusters ...