Site Reliability Engineer

Site reliability Engineer

Location - Glasgow, Scotland (Hybrid - 2-3days weekly onsite)

3 months employment contract - can extend

Government Sector

We are looking for a L7 senior Site Reliability Engineer (6-9 years overall experience) who can assess existing monitoring, reliability, and operational practices and define a comprehensive observability model across the application landscape. The ideal candidate will drive improvements that enhance availability, performance, resilience, and operational readiness across critical systems. This role requires a strong advisory mindset, deep production engineering expertise, and the ability to influence development, platform, and operations teams to deliver measurable reliability and operational outcomes.

Technical Skills:

Strong experience as a Senior Site Reliability Engineer, Reliability Engineer, or Platform Engineer operating at L7 level.
Deep expertise in application monitoring, observability, alerting, incident management, and production reliability.
Hands-on experience assessing, selecting, and implementing monitoring and observability tools, frameworks, and integration approaches.
Strong understanding of SRE principles including SLIs, SLOs, error budgets, and resilience engineering.
Design and operation of highly available, fault-tolerant, multi-region systems
Advanced capacity planning, load modeling, and traffic forecasting
Deep expertise in metrics, logs, traces, and event-based telemetry

Process Skills:

Assess current monitoring, alerting, and incident management mechanisms to identify gaps and improvement opportunities.
Define and implement an end-to-end application monitoring and observability model aligned across the SDLC.
Identify risks related to reliability, performance, availability, and operational readiness and recommend mitigation strategies.
Establish SRE best practices including proactive alerting, error budgets, operational runbooks, and reliability metrics.
Articulate expected operational benefits such as improved system stability, faster incident resolution, reduced operational risk, and improved customer experience.

Apply Now

Site Reliability Engineer

Job Details