Lead Site Reliability Engineer Sunderland, UK
Sunderland, United Kingdom
Tombola
our critical systems and services are always reliable, available, and performing at their best. What will yo u be doing? As an SRE, you'll be instrumental in implementing automation, monitoring, and incident response strategies to minimize downtime and optimize our operations. You'll collaborate closely with our development, infrastructure, and security teams, balancing exciting new feature delivery with … fast as possible. Post-incident analysis: After resolving incidents, perform root cause analysis (RCS), including a post-incident review, and document findings to prevent similar issues in the future. Automation and Efficiency Automate manual tasks: Automate repetitive operational tasks to boost efficiency, reduce human errors, and accelerate delivery. Infrastructure automation: Utilise Terraform, Git, and TeamCity to automate infrastructure … provisioning and configuration management. Deployment pipelines: Help develop and maintain automated deployment pipelines (e.g., CI/CD) to streamline releases and reduce manual intervention. Capacity Planning and Scaling Plan for scalability: Ensure our systems can scale efficiently to meet demand, both horizontally (adding more servers) and vertically (increasing server resources). Optimize resource usage: Monitor and optimize resource More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted: