Site Reliability Engineer (SRE)

Site Reliability Engineer – (SRE, Site Reliability Engineer, Terraform, AKS, Azure, Kubernetes, PowerShell, Python, Bash, Datadog, Monitoring Tools) – Permanent – Remote

Charles Simon Associates are currently recruiting for an SRE Engineer on a permanent basis. This role is for a global business with a HQ in the City of London.

Candidates will need to be British Citizens due to Security Clearance requirements.

Location: Remote, with some travel to London

Salary: Up to £125,000 per annum

Skills/Requirements for the Site Reliability Engineer:

  • Extensive SRE experience within previous roles
  • Strong Terraform skills
  • Proven Kubernetes and AKS experience
  • Experience in creating and modifying terraform deployment on live environments
  • Experience with Monitoring solutions ideally Datadog, however Azure Application Insight, Log Analytics or Grafana
  • Scripting skills for automation within; PowerShell, Python or Bash
  • Experience with web based applications

Desirable Skills:

  • Knowledge or commercial experience of Microservices Architecture
  • Kanban
  • Any prior experience of working with Puppet and Chef would be advantageous

Start date is ASAP for the Site Reliability Engineer

The Site Reliability Engineer will be responsible for:

  • Designing and enforcing service-level objectives (SLOs), SLIs, and SLAs to ensure reliability targets are measurable and aligned with business expectations
  • Implementing incident response frameworks, including runbooks, postmortems, and blameless RCA processes to drive continuous improvement
  • Integrating observability tooling (e.g. Prometheus, Grafana, Datadog, OpenTelemetry) to enable proactive detection and resolution of system anomalies
  • Managing infrastructure as code (IaC) using tools like Terraform, Pulumi, or CloudFormation to ensure repeatable, auditable deployments
  • Optimizing cost and resource utilization across cloud environments through rightsizing, autoscaling, and lifecycle policies
  • Driving chaos engineering initiatives to test system resilience under failure conditions and validate recovery strategies
  • Championing security best practices within infrastructure—e.g. secrets management, IAM policies, and vulnerability scanning
  • Collaborating with DevOps and platform teams to build paved-road deployment patterns and internal developer portals
  • Leading capacity planning and load testing efforts to anticipate scaling needs and prevent bottlenecks
  • Contributing to architectural decisions that impact reliability, latency, and fault domains across distributed systems

Please send an up-to-date copy of your CV to be considered for the Site Reliability Engineer

Site Reliability Engineer – (SRE, Site Reliability Engineer, Terraform, AKS, Azure, Kubernetes, PowerShell, Python, Bash, Datadog, Monitoring Tools) – Permanent – Remote

Job Details

Company
Charles Simon Associates Ltd
Location
London, South East, England, United Kingdom
Employment Type
Full-Time
Salary
£100,000 - £125,000 per annum
Posted