Senior SRE GCP
As a Senior Site Reliability Engineer, you will be responsible for supporting high-throughput systems that serve millions of customers and billions of requests each month. You'll work on complex hybrid-cloud architectures, with a focus on Kubernetes-based workloads, networking, and monitoring solutions.
Senior Site Reliability Engineer Location: London Salary: £80,000 - £90,000 per annum Hours: Full-time Working Pattern: Hybrid (minimum two days per week in London)
About the RoleAs a Senior Site Reliability Engineer, you will be responsible for supporting high-throughput systems that serve millions of customers and billions of requests each month. You'll work on complex hybrid-cloud architectures, with a focus on Kubernetes-based workloads, networking, and monitoring solutions.
You'll also have the opportunity to drive improvements across cloud deployments, CI/CD pipelines, and cost optimisation while exploring new technologies and automation opportunities. Acting as a subject matter expert in site reliability engineering, you'll help foster a culture of continuous learning within the team.
Key Responsibilities- Ensure critical systems are highly available, scalable, and resilient.
- Develop and implement SLAs/SLOs/SLIs to enhance system reliability.
- Build tools to improve incident management processes, including alerting mechanisms, runbooks, and auto-resolving solutions.
- Drive innovation by exploring AI tooling and automation to improve SRE capabilities.
- Collaborate with teams to optimise cloud deployments and monitoring solutions.
- Actively participate in postmortems and support rotas to ensure operational excellence.
We're seeking candidates with:
- Proven experience in software development, testing, monitoring, and operational stability at scale.
- Expertise in Kubernetes (ideally microservice architectures using Istio service mesh).
- Strong knowledge of cloud-native solutions (preferably Google Cloud), including storage, networking, and resource provisioning.
- Hands-on experience with monitoring tools such as Datadog or Dynatrace.
- Proficiency in coding/scripting languages such as Python or Bash.
- A solid understanding of automation best practices and CI/CD pipelines.
- Experience designing APIs and working with database operations (streaming/batch).
Robert Walters Operations Limited is an employment business and employment agency and welcomes applications from all candidates