Site Reliability Engineer
- Hiring Organisation
- Cognizant
- Location
- London Area, United Kingdom
Azure Container Instances (ACI) — including cluster lifecycle, node pools, autoscaling, ingress, service mesh, secrets, and backup/restore. Observability : Instrument services and infra with New Relic, Grafana (incl. Loki/Tempo where applicable) and cloud‐native telemetry. Define SLIs/SLOs, build actionable dashboards, alerts, and runbooks that … knowledge (controllers, scheduling, ingress, autoscaling, troubleshooting) and experience with EKS/AKS/GKE and/or ECS/ACI. Observability: Practical use of New Relic and Grafana to define metrics/traces/logs, tune alerts, and drive SLOs. Scripting & automation: Proficiency in Python and Bash; experience ...