services across our business. What You'll Do Design and implement a comprehensive AI testing and evaluation framework for all AI solutions, including LLM-based tools, RAG systems, and third-party platforms. Define and document quality standards for semantic accuracy, factual consistency, bias, tone, and relevance. Develop reusable testing templates … user trust. What You'll Bring Strong hands-on experience in testing and evaluation of AI or software systems, ideally with NLP or LLM-based applications. Understanding of prompt evaluation, semantic search, and LLM behaviour (accuracy, hallucination, bias, tone, etc.). Familiarity with tools like Trulens, HumanLoop, PromptLayer, or similar ...