1 of 1 Remote/Hybrid Test Automation Jobs in Portsmouth

Site Reliability Engineer - Data Centers

Hiring Organisation
TGS International Group
Location
Portsmouth, England, United Kingdom
clusters using automated workflows Execute and analyse performance and stability benchmarks orchestrated via a workload scheduler Validate results against expected performance and reliability thresholds Test Framework & Automation Maintain and extend the automated validation framework built using Python and Ansible Integrate new test cases to support additional hardware … platforms and GPU generations Improve test reliability, coverage, and execution efficiency Remediation & System Integrity Diagnose and remediate unhealthy nodes through configuration changes or software fixes Coordinate with on-site support teams for hardware replacements when required Ensure all issues are resolved and documented prior to handover to production operations ...