AI Safety & Adversarial Testing Contract - 3 months

Company Description

T3 partners with organizations deploying production AI systems in high-risk environments where failures can have significant regulatory, operational, or safety implications. With a team instrumental in shaping global AI standards and governance frameworks, T3 provides AI assurance services to major Big Tech companies and complex enterprises.

This is a three month contract with the opportunity to extend.

Candidates must have direct experience working with frontier labs or large technology companies on safety evaluation, red teaming, or adversarial testing. Experience designing and operationalising testing frameworks for production grade generative models is essential.

Role Description

Support the design and execution of a structured adversarial testing framework across LLM, image, and video generation models. Responsible for developing the SOP, adversarial methodology, prompt expansion strategy, and delivery of formal testing reports aligned to client policy.

It requires deep safety domain expertise combined with hands on testing capability. The role will report to a strategic lead.

Core Responsibilities

1. Adversarial Testing Framework Design

  • Define what constitutes truly adversarial prompts
  • Develop taxonomy of attack types and failure modes
  • Translate real user behaviour into structured attack vectors
  • Define evaluation methodology across LLM, image, and video models
  • Create severity and risk classification frameworks

2. Test Set Development & Expansion

  • Augment existing prompt libraries with adversarial variants
  • Design systematic prompt mutation strategies
  • Develop model specific adversarial patterns
  • Define coverage metrics for risk domains
  • Identify blind spots in current test libraries

3. Evaluation & Execution

  • Run adversarial tests against selected model versions
  • Classify and analyse failure types
  • Map outputs against internal policy requirements
  • Produce structured evaluation findings
  • Support reuse of client internal evaluation platform

4. Reporting & Stakeholder Communication

  • Produce formal testing reports
  • Present findings to technical and policy audiences
  • Clearly distinguish methodology from execution
  • Define remediation pathways and improvement loops

Required Skills & Experience

Technical

  • Strong background in AI safety, red teaming, or adversarial ML
  • Experience testing LLMs and generative models
  • Familiarity with prompt injection, jailbreaks, and boundary attacks
  • Understanding of multimodal models (text, image, video)
  • Experience defining structured evaluation frameworks
  • Knowledge of benchmark design and failure taxonomy creation

Domain Knowledge

  • Responsible AI principles
  • Risk classification frameworks
  • Safety policy interpretation
  • Alignment evaluation concepts
  • Model evaluation lifecycle

Analytical

  • Ability to design rigorous testing methodology
  • Strong failure analysis capability
  • Quantitative and qualitative evaluation skills
  • Ability to convert abstract risks into concrete test cases

Soft Skills

  • Comfortable operating in ambiguous scope environments
  • Clear communicator across technical and policy stakeholders
  • Able to work without creating single point dependency risk
  • Structured thinker

Ideal Background

  • AI safety researcher
  • Red teaming lead for generative AI systems
  • AI evaluation specialist
  • Experience in frontier or production generative systems
  • Experience with model benchmarking and structured evaluation labs

Job Details

Company
T3
Location
City of London, London, United Kingdom
Posted