Lead Site Reliability Engineer Sunderland, UK
Sunderland, United Kingdom
Tombola
reliability of critical systems and services, meeting all uptime SLAs (Service Level Agreements). Incident management: Quickly respond to incidents, investigate root causes, and ensure effective postmortems and continuous improvement processes are in place. Failure detection and response: Proactively identify potential failures or performance bottlenecks before they impact users, and respond to failures and outages effectively. … in place to protect production environments. Documentation: Documentation of processes: Create and maintain detailed documentation for all infrastructure components, incident response procedures, and runbooks to ensure efficient operations. Continuous Improvement: Iterative improvements: Continuously evaluate and improve system reliability, performance, and efficiency, seeking new technologies or approaches to enhance operational effectiveness. More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted: