Telemetry SRE Engineer

Technical Skills Must have:

Must‐Have

Observability & Reliability Engineering

· Strong hands‐on experience across core observability pillars including metrics, traces, service health and distributed systems visibility

· Practical experience implementing OpenTelemetry across application, platform and infrastructure layers

· Ability to design, deploy and operate end‐to‐end observability pipelines (collector‐to‐backend, agent management, data flows, routing and filtering)

· Strong understanding of SLI/SLO frameworks, error budgets and reliability‐focused operating models

· Experience defining alerting strategy, tuning thresholds and reducing operational noise through effective signal engineering

Observability Platforms & Tooling

· Hands‐on expertise in one or more enterprise‐grade observability platforms (Dynatrace, Splunk Observability, Datadog or equivalent)

· Proficiency with Prometheus ecosystem components including Alertmanager

· Experience designing clear, insightful dashboards and visualisations using Grafana

· Strong troubleshooting capability using metrics, traces and dependency insights to diagnose performance and availability issues

Cloud & Platform Monitoring

· Strong technical experience with at least one major public cloud (AWS, Azure or GCP)

· Monitoring fundamentals across cloud‐native services including compute, storage, networking, load balancers and managed services

· Solid understanding of cloud networking constructs (VPC/VNet, subnets, routing, NAT, firewalls and security groups)

Containers & Kubernetes

· Working knowledge of Kubernetes objects (pods, services, deployments) and operational lifecycle

· Experience monitoring containerised/app‐modernisation workloads

· Basic experience with Helm or Kustomize for packaging, configuration and deployment

· Ability to troubleshoot application behaviour and platform-level issues within container environments

Programming & Automation

· Proficiency in one or more languages (Python, Go, Java) to support automation and tooling

· Experience writing automation scripts and utilities supporting observability and SRE practices

· Awareness of integrating observability checks within CI/CD pipelines

· Comfort with shell scripting for diagnostics and operational tasks

Data & Analytics

· Strong understanding of time‐series data and telemetry characteristics

· Hands‐on experience with PromQL, SignalFlow, Metrics Explorer or equivalent query languages

· Ability to analyse latency percentiles (p95/p99), error rates and throughput metrics

· Working knowledge of SQL for querying telemetry backends or data stores

Apply Now

Telemetry SRE Engineer

Job Details