let's talk. About the Role We're looking for a Senior Site Reliability Engineer to join our SRE team. This is a hybrid role that blends deep platform engineering with application-level troubleshooting . You'll be responsible for the stability, performance, and resilience of our cloud-native infrastructure while also being on the front line when issues … strategies for microservices and core platforms Continuously monitor and improve system performance, cost-efficiency, and observability (LGTM stack/Datadog) Partner with security teams on compliance and vulnerability remediation ️ ChaosEngineering & Resilience Design and execute ChaosEngineering experiments. Develop and track SLOs, SLIs, and error budgets for critical systems Conduct resilience reviews and game days to … to backend service disruptions Investigate issues across infrastructure, Kubernetes, logs, traces, and service code Resolve incidents and support root causes (Java and GoLang services) Contribute to postmortems and reliability engineering initiatives Who You Are Essential Experience 5+ years in an SRE, DevOps, or infrastructure role Deep hands-on experience with AWS , EKS/Kubernetes , and Terraform Working knowledge of More ❯
and manage reliability, feature flags and cloud costs. The Harness Software Delivery Platform includes modules for CI, CD, Cloud Cost Management, Feature Flags, Service Reliability Management, Security Testing Orchestration, ChaosEngineering, Software Engineering Insights and continues to expand at an incredibly fast pace. Harness is led by technologist and entrepreneur Jyoti Bansal, who founded AppDynamics and sold More ❯
Staff Software Engineer, AI Reliability Engineering London, UK About Anthropic Anthropic's mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build … maintaining SLO/SLA frameworks for business-critical services Are comfortable working with both traditional metrics (latency, availability) and AI-specific metrics (model performance, training convergence) Have experience with chaosengineering and systematic resilience testing Can effectively bridge the gap between ML engineers and infrastructure teams Have excellent communication skills Strong candidates may also: Have experience operating large More ❯
and manage reliability, feature flags and cloud costs. The Harness Software Delivery Platform includes modules for CI, CD, Cloud Cost Management, Feature Flags, Service Reliability Management, Security Testing Orchestration, ChaosEngineering, Software Engineering Insights and continues to expand at an incredibly fast pace. Harness is led by technologist and entrepreneur Jyoti Bansal, who founded AppDynamics and sold … afraid of being data driven - including using Salesforce and other tools to track your progress Managing full sales cycle from prospect to close Collaborating with other teams, including sales engineering and sales development About You A proven track record of driving and closing enterprise deals Account planning and execution skills Ability to sell C-Level and across both IT More ❯