let's talk. About the Role We're looking for a Senior Site Reliability Engineer to join our SRE team. This is a hybrid role that blends deep platform engineering with application-level troubleshooting . You'll be responsible for the stability, performance, and resilience of our cloud-native infrastructure while also being on the front line when issues … strategies for microservices and core platforms Continuously monitor and improve system performance, cost-efficiency, and observability (LGTM stack/Datadog) Partner with security teams on compliance and vulnerability remediation ️ ChaosEngineering & Resilience Design and execute ChaosEngineering experiments. Develop and track SLOs, SLIs, and error budgets for critical systems Conduct resilience reviews and game days to … to backend service disruptions Investigate issues across infrastructure, Kubernetes, logs, traces, and service code Resolve incidents and support root causes (Java and GoLang services) Contribute to postmortems and reliability engineering initiatives Who You Are Essential Experience 5+ years in an SRE, DevOps, or infrastructure role Deep hands-on experience with AWS , EKS/Kubernetes , and Terraform Working knowledge of More ❯
ll focus on security, compliance, and continuous improvement to deliver resilient and high-performing systems. What you'll do The team focuses on Site Reliability, Platform, DevOps, and Systems engineering, to build and run large-scale, distributed, fault-tolerant systems on public cloud. This is a hybrid role, first leading a team, mentoring, coaching, and developing peers across the … debugging all services that run within the K8s ecosystem, including Istio service mesh SRE mentality (SLI, SLO & SLA) using Observability, Logging, Monitoring & Alerting (Dynatrace) Ideally coming from a software engineering or exceptional scripting skill background and have moved into SRE/DevOps while gaining a wider understanding of application ecosystems. Experience programming in at least two (but not all … following languages: Java, Groovy, Scala, Python, Go, C++, JavaScript, .Net, PowerShell or Bash/Shell. Knowledge of GCP and Azure cloud platforms. Strong expertise in DevOps tools Experience with ChaosEngineering, Day-2 Ops, Resiliency and Disaster Recovery Planning and execution Technical architecture and Microservice design principles. About working for us Our ambition is to be the leading More ❯
About the Opportunity Are you a seasoned technology leader with a passion for building cutting-edge enterprise products and a hands-on approach to engineering? Join Citi's Cloud Technology Services (CTS) team and be part of our commitment to transform Citi technology leveraging game-changing Cloud capabilities to drive agility, efficiency, and innovation. We're providing our businesses … with a competitive edge by leveraging public cloud scale and enabling new infrastructure economics. As the Public Cloud Engineering Practices Lead , you will play a pivotal role in shaping and executing our public cloud strategy. You will be part of a team that continues to deliver big! From building cloud base High Performance Compute (HPC) platform to run huge … GenAI at scale, all the way to enabling payments solutions, this team is at the forefront of innovation. What You'll Do: Lead the Charge: Own the public cloud engineering practices strategy and its execution, enabling Citi's secure and enterprise-scale adoption of public cloud. You will provide technical authority for all engineering practices across all public More ❯
About the Opportunity Are you a seasoned technology leader with a passion for building cutting-edge enterprise products and a hands-on approach to engineering? Join Citi's Cloud Technology Services (CTS) team and be part of our commitment to transform Citi technology leveraging game-changing Cloud capabilities to drive agility, efficiency, and innovation. We're providing our businesses … with a competitive edge by leveraging public cloud scale and enabling new infrastructure economics. As the Public Cloud Engineering Practices Lead , you will play a pivotal role in shaping and executing our public cloud strategy. You will be part of a team that continues to deliver big! From building cloud base High Performance Compute (HPC) platform to run huge … GenAI at scale, all the way to enabling payments solutions, this team is at the forefront of innovation. What You'll Do: Lead the Charge: Own the public cloud engineering practices strategy and its execution, enabling Citi's secure and enterprise-scale adoption of public cloud. You will provide technical authority for all engineering practices across all public More ❯
Staff Software Engineer, AI Reliability Engineering London, UK About Anthropic Anthropic's mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build … maintaining SLO/SLA frameworks for business-critical services Are comfortable working with both traditional metrics (latency, availability) and AI-specific metrics (model performance, training convergence) Have experience with chaosengineering and systematic resilience testing Can effectively bridge the gap between ML engineers and infrastructure teams Have excellent communication skills Strong candidates may also: Have experience operating large More ❯
and manage reliability, feature flags and cloud costs. The Harness Software Delivery Platform includes modules for CI, CD, Cloud Cost Management, Feature Flags, Service Reliability Management, Security Testing Orchestration, ChaosEngineering, Software Engineering Insights and continues to expand at an incredibly fast pace. Harness is led by technologist and entrepreneur Jyoti Bansal, who founded AppDynamics and sold … afraid of being data driven - including using Salesforce and other tools to track your progress Managing full sales cycle from prospect to close Collaborating with other teams, including sales engineering and sales development About You A proven track record of driving and closing enterprise deals Account planning and execution skills Ability to sell C-Level and across both IT More ❯