OpenShift Telemetry Engineer
OpenShift Telemetry Engineer
The Role
The Role
We are seeking a skilled OpenShift Telemetry Engineer to join our team. In this role, you will be responsible for implementing, managing, and optimizing the observability stack within a Red Hat OpenShift Container Platform environment to ensure system health, performance, and security.
You will act as a bridge between application monitoring and infrastructure observability, leveraging modern telemetry and data streaming tools.
Key Responsibilities- Design, implement, and maintain data pipelines to ingest and process OpenShift telemetry data (metrics, logs, and traces) at scale.
- Stream OpenShift telemetry through Kafka (producers, topics, schemas) and build resilient consumer services for transformation and enrichment.
- Engineer data models and routing mechanisms for multi-tenant observability while ensuring data lineage, quality, and SLA adherence across streaming layers.
- Integrate processed telemetry into Splunk for dashboards, visualization, alerting, and analytics to achieve Observability Level 4 (proactive insights).
- Implement schema management, governance, and versioning using Avro or Protobuf for telemetry events.
- Build automated validation, replay, and backfill mechanisms to ensure data reliability and recovery.
- Instrument services with OpenTelemetry, standardizing tracing, metrics, and structured logging across platforms.
- Utilize LLM-based capabilities to enhance observability (e.g., query assistance, anomaly summarization, runbook generation).
- Collaborate with Platform, SRE, and Application teams to integrate telemetry, alerts, and SLOs.
- Ensure security, compliance, and best practices for telemetry data pipelines and observability platforms.
- Document data flows, schemas, dashboards, and operational runbooks.
- Hands-on experience building streaming data pipelines with Kafka (producers/consumers, schema registry, Kafka Connect, KSQL, Kafka Streams).
- Strong experience with OpenShift / Kubernetes telemetry, including OpenTelemetry and Prometheus.
- Experience integrating telemetry into Splunk (HEC, Universal Forwarder, source types, CIM) and building dashboards and alerts.
- Strong data engineering skills using Python (or similar languages) for ETL/ELT, enrichment, and validation.
- Experience with event schemas (Avro, Protobuf, JSON) and schema compatibility strategies.
- Familiarity with observability frameworks and maturity models, driving toward Level 4 observability (proactive monitoring and automated insights).
- Understanding of hybrid cloud and multi-cluster telemetry architectures.
- Security and compliance practices for data pipelines, including:
- Secret management
- RBAC
- Encryption in transit and at rest
- Strong problem-solving and analytical skills.
- Ability to work effectively in cross-functional teams.
- Excellent communication and documentation skills.
JBRP1_UKTJ