Site Reliability Engineer

Lead Site Reliability Engineer

Job Type: Contract

Location: London, UK

Key Responsibilities:

  • The person will be responsible for as a Technical Authority (SME) for both Azure and Terraform, guide teams on SRE practices, approve production changes.
  • This role is platform-focused, not application-specific, and requires deep expertise in SRE principles, Azure Landing Zones (Hub-and-Spoke), Terraform, DevOps enablement, monitoring/observability, and incident management.
  • He should be involved in address long-term reliability and operational risks while building and mentoring SRE teams.
  • Design, implement, and operate Azure Hub-and-Spoke Landing Zone architectures.
  • Reduce operational toil through automation and platform improvements.
  • Own and evangelize SRE principles including availability, reliability, scalability, resilience, and operational maturity.
  • Define Terraform best practices, state management, drift detection, and CI/CD integration.
  • Build and maintain CI/CD foundations using GitHub Actions.
  • Design and standardize monitoring and observability across the Azure platform.
  • Lead and participate in major incident management following ITIL processes.
  • Partner with security teams to implement least-privilege access and secure-by-default architectures.
  • Enforce governance using Azure Policy and standardized platform controls.
  • Lead, mentor, and grow a high-performing SRE/platform engineering team.
  • Drive SRE culture across the organization and set technical direction, standards, and operational maturity goals.
  • Clearly explain and apply SRE concepts (SLIs, SLOs, error budgets, toil reduction, blameless postmortems).
  • Define and track platform-level SLIs/SLOs and ensure alignment with business objectives.
  • Strong hands-on knowledge of RBAC and IAM in Azure, Managed Identities, Azure Key Vault for secrets, keys, and certificates

Job Details

Company
Damco Solutions
Location
London Area, United Kingdom
Posted