Site Reliability Engineer

Site reliability Engineer

Location - Glasgow, Scotland (Hybrid - 2-3days weekly onsite)

3 months employment contract - can extend

Government Sector

We are looking for a L7 senior Site Reliability Engineer (6-9 years overall experience) who can assess existing monitoring, reliability, and operational practices and define a comprehensive observability model across the application landscape. The ideal candidate will drive improvements that enhance availability, performance, resilience, and operational readiness across critical systems. This role requires a strong advisory mindset, deep production engineering expertise, and the ability to influence development, platform, and operations teams to deliver measurable reliability and operational outcomes.

Technical Skills:

  • Strong experience as a Senior Site Reliability Engineer, Reliability Engineer, or Platform Engineer operating at L7 level.
  • Deep expertise in application monitoring, observability, alerting, incident management, and production reliability.
  • Hands-on experience assessing, selecting, and implementing monitoring and observability tools, frameworks, and integration approaches.
  • Strong understanding of SRE principles including SLIs, SLOs, error budgets, and resilience engineering.
  • Design and operation of highly available, fault-tolerant, multi-region systems
  • Advanced capacity planning, load modeling, and traffic forecasting
  • Deep expertise in metrics, logs, traces, and event-based telemetry

Process Skills:

  • Assess current monitoring, alerting, and incident management mechanisms to identify gaps and improvement opportunities.
  • Define and implement an end-to-end application monitoring and observability model aligned across the SDLC.
  • Identify risks related to reliability, performance, availability, and operational readiness and recommend mitigation strategies.
  • Establish SRE best practices including proactive alerting, error budgets, operational runbooks, and reliability metrics.
  • Articulate expected operational benefits such as improved system stability, faster incident resolution, reduced operational risk, and improved customer experience.

Job Details

Company
Ubique Systems
Location
Glasgow, Scotland, United Kingdom
Hybrid / Remote Options
Posted