Lead AI Tester / QA Lead (LLM & GenAI)
We’re supporting a consulting partner on the search for a Lead AI Tester / QA Lead to define and own the testing strategy for several AI / LLM use cases at a major financial services client. This role is primarily about strategy, governance and delegation, with some scope to remain hands‐on.
The opportunity
You will:
- Create and maintain the test strategy and playbook for AI solutions - covering functional and non‐functional testing, regression, safety, bias, hallucinations, performance and cost.
- Translate that strategy into clear documentation and workflows (e.g. Confluence playbooks, Jira ticket structures, test plans and checklists).
- Delegate and coordinate execution across QA engineers and developers, ensuring test activities are properly defined, prioritised and completed.
- Define how AI testing fits into the CI/CD pipeline and SDLC: where tests run, what they block, and how results are surfaced.
- Establish and track evaluation metrics and monitoring for AI systems (e.g. quality, drift, safety, latency, cost), and report these to stakeholders.
- Guide and engage both internal and client stakeholders on all aspects of testing – test strategy definition, standards, metrics, monitoring and release readiness.
What we’re looking for
Must have:
- Proven experience as a Test Lead / QA Lead / SDET owning test strategy and automation in complex environments (ideally financial services or similar).
- Ability to create a test strategy document or playbook for AI solutions and keep it current.
- Experience delegating execution of a test strategy to other team members – for example via Jira ticket definition and structured test packs.
- Strong understanding of how tests and environments fit into CI/CD pipelines and wider SDLC activities.
Should have:
- Practical exposure to testing AI / ML / LLM‐based systems (e.g. chatbots, RAG applications, AI‐assisted workflows) and an interest in deepening this.
- Familiarity with approaches to evaluating LLM outputs (for example, semantic‐similarity based methods or using LLM‐as‐a‐judge), and with concepts such as hallucinations, safety, and drift.
- Experience working in consulting or client‐facing roles and presenting testing approaches and results to senior stakeholders.
Details
- Location: UK‐based, hybrid working with client and consulting teams.
- Contract: Initially 3–6 months, with strong potential to extend.
- Day rate: Up to around £600–650 per day, depending on experience.
If you’re a QA leader who wants to shape how AI and LLM solutions are tested in a regulated environment and can turn that thinking into a clear, usable playbook, please get in touch.