Lead AI Tester / QA Lead (LLM & GenAI)

We’re supporting a consulting partner on the search for a Lead AI Tester / QA Lead to define and own the testing strategy for several AI / LLM use cases at a major financial services client. This role is primarily about strategy, governance and delegation, with some scope to remain hands‐on.

The opportunity

You will:

  • Create and maintain the test strategy and playbook for AI solutions - covering functional and non‐functional testing, regression, safety, bias, hallucinations, performance and cost.
  • Translate that strategy into clear documentation and workflows (e.g. Confluence playbooks, Jira ticket structures, test plans and checklists).
  • Delegate and coordinate execution across QA engineers and developers, ensuring test activities are properly defined, prioritised and completed.
  • Define how AI testing fits into the CI/CD pipeline and SDLC: where tests run, what they block, and how results are surfaced.
  • Establish and track evaluation metrics and monitoring for AI systems (e.g. quality, drift, safety, latency, cost), and report these to stakeholders.
  • Guide and engage both internal and client stakeholders on all aspects of testing – test strategy definition, standards, metrics, monitoring and release readiness.

What we’re looking for

Must have:

  • Proven experience as a Test Lead / QA Lead / SDET owning test strategy and automation in complex environments (ideally financial services or similar).
  • Ability to create a test strategy document or playbook for AI solutions and keep it current.
  • Experience delegating execution of a test strategy to other team members – for example via Jira ticket definition and structured test packs.
  • Strong understanding of how tests and environments fit into CI/CD pipelines and wider SDLC activities.

Should have:

  • Practical exposure to testing AI / ML / LLM‐based systems (e.g. chatbots, RAG applications, AI‐assisted workflows) and an interest in deepening this.
  • Familiarity with approaches to evaluating LLM outputs (for example, semantic‐similarity based methods or using LLM‐as‐a‐judge), and with concepts such as hallucinations, safety, and drift.
  • Experience working in consulting or client‐facing roles and presenting testing approaches and results to senior stakeholders.

Details

  • Location: UK‐based, hybrid working with client and consulting teams.
  • Contract: Initially 3–6 months, with strong potential to extend.
  • Day rate: Up to around £600–650 per day, depending on experience.

If you’re a QA leader who wants to shape how AI and LLM solutions are tested in a regulated environment and can turn that thinking into a clear, usable playbook, please get in touch.

Job Details

Company
RAW Search
Location
London Area, United Kingdom
Hybrid / Remote Options
Posted