components Implement safe deployment strategies (e.g. blue/green, canary releases) Partner with engineers to improve build speed, test reliability, and deployment confidence Observability, Reliability & SRE Build and operate observability stacks (metrics, logging, tracing) Define and monitor SLOs/SLAs for latency, availability, and reliability Create runbooks, playbooks, and incident … infrastructure Exposure to AI/ML platforms, inference systems, or data pipelines Familiarity with modern CI/CD tooling and GitOps approaches Experience with observability tooling (metrics, logs, tracing) Background in cloud platforms, AI infrastructure, or high-scale SaaS environments Why Join Work on core infrastructure powering cutting-edge ...