Site Reliability Engineer

Role Overview

We are seeking highly skilled Site Reliability Engineers (SREs) to join a fast-paced infrastructure team supporting enterprise-scale platforms. This role sits at the intersection of Development and Operations, focusing on building scalable, resilient, and automated infrastructure systems.

The ideal candidate will be automation-first, comfortable working in production environments, and experienced in container orchestration, CI/CD pipelines, and Infrastructure as Code.

Key Responsibilities

Design, implement, and maintain scalable, highly available production systems
Automate operational tasks using Shell scripting (Bash/Zsh)
Contribute to and support Python-based application components
Manage and optimise Kubernetes clusters and containerised deployments
Build and maintain CI/CD pipelines using Spinnaker and GitHub Actions
Implement Infrastructure as Code (IaC) using Pulumi
Perform system monitoring, troubleshooting, and root cause analysis
Participate in on-call rotation and incident response
Improve system reliability, performance, and observability
Collaborate with development teams to enhance deployment and release processes

Required Skills & Experience

Programming & Scripting

Strong experience with Shell scripting (Bash/Zsh)
Solid Python programming experience
Automation mindset with experience eliminating manual processes

Containerisation & Orchestration

Strong hands-on experience with Kubernetes (K8s)
Docker containerisation expertise
Experience managing production-grade clusters

CI/CD & Deployment

Experience with Spinnaker
Hands-on experience with GitHub Actions
Strong understanding of modern DevOps practices

Infrastructure & Cloud

Infrastructure as Code using Pulumi
Strong understanding of cloud-native architecture principles
Experience managing scalable distributed systems

Version Control

Git
GitHub workflows and branching strategies

Preferred Experience

Experience working in large-scale enterprise or high-availability environments
Strong troubleshooting and production support experience
Familiarity with monitoring and observability tooling
Experience in high-traffic, performance-sensitive systems

Apply Now

Site Reliability Engineer

Job Details