Site Reliability Engineer - Data Centers
- Hiring Organisation
- TGS International Group
- Location
- Portsmouth, England, United Kingdom
workload scheduler Validate results against expected performance and reliability thresholds Test Framework & Automation Maintain and extend the automated validation framework built using Python and Ansible Integrate new test cases to support additional hardware platforms and GPU generations Improve test reliability, coverage, and execution efficiency Remediation & System Integrity Diagnose and remediate … troubleshooting Linux systems Confident use of CLI tools for diagnostics, including analysis of kernel logs, drivers, and system services Proven experience writing and maintaining Ansible playbooks Proficiency in Python for automation, test execution, and parsing results Strong analytical and problem-solving skills with attention to detail Excellent written and verbal ...