Lead AI Red Teaming & QA Engineer
We are seeking a Lead AI Red Teaming & QA Engineer to design and execute automated adversarial testing for our enterprise Agentic AI platforms. You will move beyond traditional software QA to build continuous safety pipelines, ensuring our non-deterministic LLM agents, RAG systems, and tool integrations are secure, resilient, and compliant before production release.
Key Responsibilities
Automated Adversarial Testing: Build and integrate automated red teaming suites into CI/CD pipelines using frameworks like Garak, Pyrit, and AgentDojo to enforce strict safety release gates.
AI Evaluation Frameworks: Develop metrics and continuous testing for core AI risks, including hallucinations, memorisation, algorithmic bias, uncertainty, and model drift.
Regulatory Compliance Evidence: Map threat models (OWASP LLM Top 10, Agentic threats) to automated test cases. Produce the technical testing evidence required by EU AI Act Article 15, DORA, and FCA Operational Resilience guidelines.
Centralised AI-BOM Platform: Own the enterprise AI Bill of Materials (AI-BOM), tracking model lineages, dataset versions, and signed artifacts as a centralized evaluation service.
Required Technical Skills
Regulated Finance: Proven experience testing software within FCA, DORA, or EU AI Act frameworks.
AWS Bedrock Ecosystem: Hands-on experience configuring, testing, and bypassing Bedrock Guardrails, Agents, and Knowledge Bases (RAG).
AI Security & Fundamentals: Solid understanding of Foundation Models, tool use (function calling), OWASP LLM Top 10, and NIST AI RMF.
Automation Stack: Strong Python development skills, experience with AI eval tools (Garak, Pyrit, Ragas), and building complex CI/CD test pipelines.
Randstad Technologies is acting as an Employment Business in relation to this vacancy.