Site Reliability Engineer
Site Reliability Engineer, Azure, GCP, Automation
A key customer of ours is seeking several SRE candidates to help with this massive build out, implementation across the GCP/Azure platforms.They are looking for several Site Reliability Engineer (SRE) to help improve the reliability, performance and observability of our Azure and GCP environments. You'll work within a multidisciplinary engineering squad, supporting the delivery, operation and continuous improvement of our cloud-hosted services.
- Support the reliability and performance of the cloud platforms your squad owns.
- Use observability tools, metrics, logs and traces to detect and prevent issues.
- Contribute to incident response, post-incident reviews and problem management activities.
- Build automation that removes toil and improves operational efficiency.
- Work collaboratively with engineers, Product Owners and platform teams to balance delivery with operational health.
- Improve SLOs, error budgets and other product health measures.
- Take part in engineering ceremonies, knowledge sharing and squad-wide improvement initiatives.
- Experience with Azure and/or GCP public cloud platforms.
- Understanding of observability (metrics, logs, traces) and its impact on system health.
- Experience with GitHub pipelines and Terraform modules.
- Exposure to SRE principles such as SLOs, SLIs and error budgets.
- Ability to contribute to automation using Python, PowerShell, Terraform, CI/CD, or similar tools.
- Solid knowledge of modern engineering practices including DevOps, Infrastructure as Code and automation.
McGregor Boyall is an equal opportunity employer and do not discriminate on any grounds.