Role Overview: We are seeking a Lead Technical Subject Matter Expert (SME) with strong systems thinking and a solid grasp of SRE principles to drive the technical uplift of capacity and observability controls across our technology estate. This role blends hands-on engineering depth with architectural oversight and focuses on enhancing performance, resilience, and control effectiveness across services and … both operational sensibility and the ability to drive scalable solutions — aligning technical capabilities with internal control frameworks and regulatory expectations. Key Responsibilities: • Lead the design and technical evaluation of capacitymanagement, utilisation monitoring, and observability controls across platforms. • Apply SRE-aligned practices to identify control gaps, performance risks, and areas for automation. • Assess existing tooling, data flows and … documentation for governance and operational readiness. Required Skills & Experience: • 10+ years in engineering, infrastructure, or technical architecture roles in complex technology environments. • Solid understanding of compute, storage, and network capacity planning across mixed deployment models. • Familiarity with SRE disciplines such as observability, service-level indicators/objectives (SLIs/SLOs), and automation of operational tasks. • Demonstrated ability to interpret More ❯
and commercial teams to ensure operational excellence. Ensure that the maturity of the NOC is fit for purpose as they rapidly grow Increase the maturity of the centre Increase capacitymanagement and implement a risk management framework Experience 15 years in senior leadership roles within telecoms or network operations. Proven success in transformation, integration, and large-scale More ❯
Site Reliability Engineering (SRE). With a strong foundation in Golang development, valuable expertise is brought to the table, enabling contributions to innovative solutions for complex monitoring, automation, and capacitymanagement challenges. As a Site Reliability Engineer, you can shape the way this company manages intricate automation and monitoring solutions. At the forefront of a new era in More ❯
Manchester, England, United Kingdom Hybrid / WFH Options
JR United Kingdom
as Golang, Python or TypeScript - so knowledge of one of these languages is required, you can learn the others! You will be developing solutions to complex monitoring, automation, and capacitymanagement problems so prior working experience of approaching tasks methodically to solve engineering problems is key. In addition to your programming skills, knowledge of metrics, monitoring and observability More ❯
the design for integration and database projects, ensuring that new services are provisioned according to departmental standards. Create, maintain and support any integration used by the organisation. Manage service capacity, monitoring services regularly to ensure proactive capacity management. Architect and maintain diagrams and documentation for all databases and service integrations, providing senior management with monthly reports related More ❯
a big impact, you will need strong collaboration skills, an ability to jump into projects and influence how teams approach reliability. Our SRE team is responsible for effective Incident management, Capacitymanagement and Availability. You will focus on establishing/maintaining SRE best practices and removing any blockers to enable the reliability of Wise. How we work … . Strong understanding of Java and relevant application frameworks (e.g. Spring Boot). Hands on experience with troubleshooting Application, Database, Data related issues Strong prioritisation skills and effective time management Interested? Find out more: The Wise Tech Stack, 2025 Edition . Scaling our Infrastructure; how we make it work Wise Engineering – https://medium.com/wise-engineering What More ❯