Quality Analyst

Job Title: LLM - AI Quality Analyst (Personalization) – English

Contract duration: 03 Months + (possible extension)

Location: 100% Remote (UK, Ireland, Malta)

Shift/schedule: 40 hours per week with 4 hours of overlap with PST.

Exp - 1+ years

Special requirements:

  • Ability to read and write in English with a high degree of comp, as English is the focus language for this project.
  • Personal Account Usage: Willingness to use your primary personal Google account (not a testing account) and enable personal data sources for a genuine assessment.
  • Schedule Flexibility: Full-time availability in your local time zone is required. We are staffing a global, 24-hour operations team.
  • Exceptional Analytical Thinking: Demonstrate ability to evaluate nuanced and ambiguous AI responses, specifically assessing personalization quality.

Skill:

  • Experience in data annotation, AI quality evaluation, content moderation, or a related role is strongly preferred.
  • BS/BA degree or equivalent experience in a relevant field (e.g., Policy, Law, Ethics, Linguistics, Journalism, Computer Science, or a related analytical field).

Role Overview:

  • As an AI Quality Analyst, you will evaluate a new personalization feature for Gemini.
  • You will assess how well the model uses information from your past Gemini conversations, Gmail, Google Search, and YouTube activity to make responses more relevant and helpful.
  • This role requires a unique blend of creativity and analytical rigor.
  • You will actively design prompts from the perspective of your own personal experiences.
  • You will then use your analytical skills to assess the quality of the model's personalized responses, evaluating dimensions like Grounding, Integration, and Helpfulness.

Key Qualifications:

  • Ability to read and write in English with a high degree of comp, as English is the focus language for this project.

Personal Account Usage:

  • Willingness to use your primary personal Google account (not a testing account) and enable personal data sources for a genuine assessment.

Schedule Flexibility:

  • Full-time availability in your local time zone is required. We are staffing a global, 24-hour operations team.
  • Exceptional Analytical Thinking: Demonstrate ability to evaluate nuanced and ambiguous AI responses, specifically assessing personalization quality.
  • Creative Prompt Engineering: Experience in designing creative, multi-turn starting prompts based on personal context to thoroughly test the model's capabilities.

Strong Evaluation Acumen:

  • Understanding of personalization concepts, including the ability to identify incorrect personalization, poor inferences, and forced connections.
  • Meticulous Attention to Detail: The ability to review Side-by-Side (SxS) model responses and spot subtle differences in naturalness and over narrating.
  • Excellent Written Communication: Superior ability to write clear, concise, and structured rationales for model rankings, explicitly referencing specific turn numbers.
  • Feedback: Ability to provide constructive feedback and detailed annotations.
  • Communication: Excellent communication and collaboration skills.
  • Independence: Self-motivated and able to work independently in a remote setting.
  • Technical Setup: Desktop/Laptop set up with a good internet connection.

Description:

  • In this role, you will be part of a dynamic team focused on evaluating the quality of personalized AI interactions. Your day-to-day work will involve:
  • Designing and executing multi-turn conversational prompts (typically 1-5 turns) that require the AI to utilize your personal information and experiences.
  • Evaluating model responses based on your intent from the starting prompt, checking if the personalization was appropriately applied.
  • Analysing responses for Grounding issues, ensuring claims about you are supported by evidence and not flawed inferences or hallucinations.
  • Assessing Integration quality to ensure personal data is woven naturally into the response without robotic "over narrating".
  • Rigorously evaluating and stack-ranking two model responses side-by-side (SxS) to determine which is overall more helpful, easy to use, and enjoyable.
  • Writing clear, defensible rationales for your comparisons, explicitly referencing where issues or positive aspects occurred in the conversation.
  • Extracting and verifying "Debug Info" from the model to confirm that chat summaries and data sources were properly utilized.
  • Maintaining strict data hygiene by deleting evaluation conversations to prevent them from polluting your future chat history.

Education & Experience:

  • BS/BA degree or equivalent experience in a relevant field (e.g., Policy, Law, Ethics, Linguistics, Journalism, Computer Science, or a related analytical field).
  • Experience in data annotation, AI quality evaluation, content moderation, or a related role is strongly preferred.

Evaluation Process:

  • Shortlisted candidates will be sent a Job Interest Form.
  • After the profile review, an assessment will be shared, which must be completed within 24 hours.
  • Based on the assessment outcomes, shortlisted candidates will be contacted to discuss the pre‐onboarding requirements.

Job Details

Company
Oak Tree Software
Location
United Kingdom
Hybrid / Remote Options
Posted