over time. Key Responsibilities Define and implement evaluation frameworks covering correctness, safety, reliability, and regression impact for AI-integrated services Develop and maintain automated test pipelines for agentic workflows, including tool orchestration and multi-step execution paths Identify, evaluate, and mitigate AI system failure modes such as hallucinations, invalid … development experience, particularly for automation and test frameworks Experience with LLM and RAG evaluation tooling, frameworks, or custom evaluation pipelines Expertise in automated testing across unit, integration, and regression testing environments Good understanding of agentic AI systems, associated risks, and operational failure modes Ability to assess technical solutions ...