Azure platforms Designing and implementing infrastructure using Terraform Managing and optimising production Kubernetes clusters Building automation, tooling, and internal services using Python and Go Enhancing observability using Prometheus, Grafana, and related monitoring stacks Implementing and running chaos engineering practices using tools such as Gremlin, Litmus, or similar Improving incident response … Azure environments Strong background in Site Reliability Engineering principles Proven expertise with Kubernetes in production Advanced Terraform skills Programming experience with Python and Go Hands-on experience with Prometheus, Grafana , and modern observability tooling Exposure to chaos engineering tools and methodologies A proactive mindset focused on reliability, performance, and continuous ...