Lead Cloud Infrastructure & Site Reliability Engineer
The Opportunity
We're partnering with a leading global organisation undergoing significant investment in its cloud and data platforms. As part of a high-performing engineering team, you'll play a key role in operating, improving, and automating a large scale Azure-based platform that supports critical cybersecurity and analytics capabilities.
This is an excellent opportunity for a hands-on Site Reliability Engineer or Cloud Infrastructure Engineer who enjoys solving complex platform challenges, driving automation, improving reliability, and reducing operational overhead across a modern Azure estate.
You'll work alongside senior engineers and platform specialists in an environment focused on continuous improvement, cloud engineering best practice, and platform resilience.
What You'll Be Doing
- Engineering and supporting cloud infrastructure within Microsoft Azure
- Building and managing Infrastructure-as-Code solutions using Terraform
- Improving platform reliability, availability, scalability, and performance
- Automating operational processes through PowerShell, Azure CLI, and other Scripting tools
- Supporting CI/CD pipelines and deployment automation
- Managing and troubleshooting Azure networking, connectivity, and security services
- Supporting Kubernetes and containerised workloads
- Monitoring platform health and driving proactive improvements
- Working closely with development, data, and platform engineering teams
- Reducing technical debt and improving operational efficiency
- Providing production support and incident resolution across critical services
- Maintaining engineering standards, documentation, and change controls
Required Experience
We're particularly interested in candidates with:
- Strong Site Reliability Engineering (SRE) or Cloud Infrastructure Engineering experience
- Deep Azure platform knowledge
- Proven Terraform and Infrastructure-as-Code expertise
- Experience with Azure DevOps and CI/CD practices
- Strong PowerShell Scripting skills
- Experience operating and supporting production cloud environments
- Azure networking knowledge, including security controls, routing, DNS, and connectivity
- Experience with monitoring and observability tools
- Troubleshooting expertise across infrastructure, platform, and application layers
- Strong automation mindset and passion for continuous improvement
Desirable Skills
Experience with any of the following would be advantageous:
- Kubernetes
- Azure Data Factory
- Databricks
- Synapse Analytics
- Azure Data Lake Storage
- Python development
- Linux administration and Scripting
- Grafana, Prometheus, Elasticsearch
- Kafka or Event Hubs
- Cybersecurity-focused environments
- Financial services or other highly regulated sectors
What We're Looking For
- A genuine SRE mindset with a focus on reliability and automation
- Strong problem-solving and troubleshooting abilities
- Excellent stakeholder engagement skills
- A proactive approach to identifying and implementing improvements
- Someone who enjoys working in a complex, enterprise-scale cloud environment