AWS Site Reliability Engineer

We’re seeking an AWS Site Reliability Engineer (SRE) with strong incident operations experience to support and improve the reliability of cloud and data platform services across AWS and Snowflake.

This role is hands-on and operationally focused: proactive monitoring, rapid incident response, service restoration, root cause analysis, and automation to improve resilience and reduce MTTR.

What you’ll do

  • Lead incident triage, coordination, and resolution for AWS and Snowflake services in production
  • Monitor and respond to alerts, dashboards, and service health indicators
  • Perform root cause analysis (RCA) and drive post-incident remediation and continuous improvement
  • Create, maintain, and improve runbooks, operational procedures, and on-call readiness
  • Participate in and strengthen on-call rotations (including operational handovers)
  • Automate repetitive operational tasks to reduce toil, improve reliability, and reduce MTTR

What you’ll bring (required)

  • Strong knowledge of AWS, including EC2, S3, IAM, VPC, Lambda, CloudWatch
  • Experience with Snowflake administration and troubleshooting
  • Familiarity with observability tooling such as CloudWatch, Datadog, Grafana, and/or Splunk
  • Solid understanding of SRE principles: SLIs, SLOs, error budgets, incident management
  • Scripting/automation skills in Python, Bash, and/or Terraform

Job Details

Company
Marks Sattin
Location
Glasgow, Scotland, United Kingdom
Posted