1 of 1 Incident Response Jobs in Glamorgan

System Monitoring & Observability Engineer (Prometheus / Grafana)

Hiring Organisation
SRT Marine Systems PLC
Location
Cardiff, South Glamorgan, United Kingdom
Employment Type
Permanent
Salary
£40000 - £65000/annum
maintain Prometheus-based monitoring solutions Develop and manage metric exporters for application and system-level data Optimise Prometheus scraping configurations and retention policies Alerting & Incident Response Define and maintain alert rules based on SLIs/SLOs and performance baselines Ensure alerts are actionable, with minimal false positives Participate … necessarily lead) in on-call rotations and incident postmortems Observability Dashboards Design and maintain Grafana dashboards for real-time operational insights Collaborate with engineering and product teams to create tailored visualisations Provide self-service dashboard capabilities for end users System Performance & Reliability Monitor infrastructure (servers, containers, databases, services ...