open to support with relocation efforts. Responsibilities Design and implement scalable, reliable, and fault-tolerant systems across cloud environments. Develop and maintain observability tools , including monitoring, logging, and alerting (e.g., Prometheus, Grafana, Datadog, ELK). Automate infrastructure provisioning, deployment, and incident response using Infrastructure as Code (IaC) tools like Terraform … Site Reliability Engineering (SRE), DevOps, or System Engineering . Strong knowledge of cloud platforms (AWS, Azure, or GCP) and cloud-native architectures. Experience with observability and monitoring tools (Prometheus, Grafana, ELK, Datadog, New Relic). Proficiency in Infrastructure as Code (IaC) tools such as Terraform, CloudFormation, or Pulumi . Hands ...