Senior Site Reliability Engineer
- Hiring Organisation
- 17918
- Location
- United Kingdom
tools like Prometheus, Grafana, ELK, and AWS CloudWatch. Build real-time dashboards to visualize system health and reliability metrics. Configure intelligent alerting based on anomaly detection and thresholds. Combine metrics, logs, and traces to enable root cause analysis and reduce Mean Time to Resolution (MTTR). Knowledge … AIOps or ML-based anomaly detection for proactive reliability management. Collaboration Work closely with development teams to integrate reliability into application design and deployment Promote a culture of shared responsibility for uptime and performance across engineering teams. Qualifications Deep expertise with various AWS services. Advanced knowledge ...