Senior Site Reliability Engineer (SRE)

Senior Site Reliability Engineer (SRE)
Remote

12-month contract (high chance of extension)

Job Description
Join a global pioneer in the video game industry and own the reliability of high-traffic, revenue-critical platforms used by millions worldwide. As a Senior SRE, you'll shape the architecture, improve platform-wide resiliency, and ensure services stay performant, scalable, and secure. This isn't just about maintaining a single system, you'll influence reliability across multiple services, driving improvements that touch the entire ecosystem.

Key Responsibilities

  • Lead incident response and troubleshooting for production systems, resolving high-severity issues and driving post-incident improvements.
  • Influence architecture to improve platform-wide reliability, resiliency, and operational efficiency, ensuring services remain available under heavy load.
  • Drive containerisation best practices and manage Kubernetes-based workloads at scale.
  • Build and maintain event-driven architectures that scale globally while ensuring fault-tolerance and high availability.
  • Automate infrastructure provisioning, deployment, and monitoring using Infrastructure as Code (Terraform, CloudFormation, Ansible, CDK).
  • Collaborate with engineering, product, and security teams to define SLOs, SLIs, and error budgets across services.
  • Provide mentorship, advocate SRE best practices, and ensure teams are empowered to deliver resilient, reliable systems.

Experience / Must-Have Skills

  • Extensive experience in AWS and AWS-managed services (EC2, Lambda, S3, VPC, CloudWatch, CloudTrail, IAM, EKS, Service Catalog, multi-account environments).
  • Strong Kubernetes / container orchestration experience, including EKS, OpenShift, Docker, and service mesh.
  • Deep understanding of networking fundamentals: DNS, VPCs, routing, load balancing, TCP/IP, firewall policies.
  • Proven track record in incident response and troubleshooting at scale.
  • Hands-on experience with infrastructure automation and CI/CD pipelines.
  • Experience designing event-driven architectures and resilient systems.
  • High level of autonomy, able to influence platform-wide decisions and architect for reliability across services.
  • Ability and desire to mentor junior staff
  • Bonus: experience in gaming, interactive entertainment, or other high-traffic, global-scale platforms.

If you are interested in this role, please feel free to submit your CV.

Job Details

Company
CBSbutler Holdings Limited trading as CBSbutler
Location
London, United Kingdom
Employment Type
Contract
Posted