Lead Site Reliability Engineer Sunderland, UK
Sunderland, United Kingdom
Tombola
rock-solid system stability. Key Accountabilities and Responsibilities: Team Leadership and Development Providing leadership, management, and development for direct reports through effective 1-to-1s, objective setting (OKRs), and performance management. Making team goals clear and ensuring they align with our broader business objectives. Collaborating with other teams and departments to achieve shared success. Partnering with our People Partner … Incident management: Quickly respond to incidents, investigate root causes, and ensure effective postmortems and continuous improvement processes are in place. Failure detection and response: Proactively identify potential failures or performance bottlenecks before they impact users, and respond to failures and outages effectively. Monitoring and Alerting Implement monitoring systems: Set up and maintain robust monitoring systems (e.g., Dynatrace) for application … performance, infrastructure health, and system metrics. Alerting: Create and manage alerting systems to notify us about issues or potential risks in a timely manner, minimizing impact on our players. Metrics collection: Define and track key metrics (e.g., uptime, latency, request rates) to measure system health and performance. Incident Response Incident resolution: Work quickly to resolve incidents, minimize downtime, and More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted: