Lead Cloud Infrastructure & Site Reliability Engineer

The Opportunity

We're partnering with a leading global organisation undergoing significant investment in its cloud and data platforms. As part of a high-performing engineering team, you'll play a key role in operating, improving, and automating a large scale Azure-based platform that supports critical cybersecurity and analytics capabilities.

This is an excellent opportunity for a hands-on Site Reliability Engineer or Cloud Infrastructure Engineer who enjoys solving complex platform challenges, driving automation, improving reliability, and reducing operational overhead across a modern Azure estate.

You'll work alongside senior engineers and platform specialists in an environment focused on continuous improvement, cloud engineering best practice, and platform resilience.

What You'll Be Doing

Engineering and supporting cloud infrastructure within Microsoft Azure
Building and managing Infrastructure-as-Code solutions using Terraform
Improving platform reliability, availability, scalability, and performance
Automating operational processes through PowerShell, Azure CLI, and other Scripting tools
Supporting CI/CD pipelines and deployment automation
Managing and troubleshooting Azure networking, connectivity, and security services
Supporting Kubernetes and containerised workloads
Monitoring platform health and driving proactive improvements
Working closely with development, data, and platform engineering teams
Reducing technical debt and improving operational efficiency
Providing production support and incident resolution across critical services
Maintaining engineering standards, documentation, and change controls

Required Experience

We're particularly interested in candidates with:

Strong Site Reliability Engineering (SRE) or Cloud Infrastructure Engineering experience
Deep Azure platform knowledge
Proven Terraform and Infrastructure-as-Code expertise
Experience with Azure DevOps and CI/CD practices
Strong PowerShell Scripting skills
Experience operating and supporting production cloud environments
Azure networking knowledge, including security controls, routing, DNS, and connectivity
Experience with monitoring and observability tools
Troubleshooting expertise across infrastructure, platform, and application layers
Strong automation mindset and passion for continuous improvement

Desirable Skills

Experience with any of the following would be advantageous:

Kubernetes
Azure Data Factory
Databricks
Synapse Analytics
Azure Data Lake Storage
Python development
Linux administration and Scripting
Grafana, Prometheus, Elasticsearch
Kafka or Event Hubs
Cybersecurity-focused environments
Financial services or other highly regulated sectors

What We're Looking For

A genuine SRE mindset with a focus on reliability and automation
Strong problem-solving and troubleshooting abilities
Excellent stakeholder engagement skills
A proactive approach to identifying and implementing improvements
Someone who enjoys working in a complex, enterprise-scale cloud environment

Apply Now

Lead Cloud Infrastructure & Site Reliability Engineer

Job Details