Site Reliability Engineer
- Hiring Organisation
- SS&C Technologies
- Location
- Gloucester, Gloucestershire, UK
- Employment Type
- Full-time
recurrences through high-quality post-incident actions. Observability as a first‐class practice: Use Grafana, Datadog, and Splunk (and related tools like Prometheus/OpenTelemetry) to detect anomalies, root cause issues, and create actionable alerts and dashboards. Run Kubernetes at scale: Operate and harden Kubernetes (EKS preferred); manage deployments, autoscaling … Nice‐to‐Have EKS internals, cluster autoscaler, managed node groups/Fargate; service mesh (Istio/Linkerd), ingress controllers (Nginx/ALB). Prometheus, OpenTelemetry, Loki/Tempo, alert tuning and SLO burn‐ratealerts. Argo CD/FluxCD, Helm chart authoring, Kustomize. CD patterns (blue/green, canary, feature flags ...