Prompt Engineer (Model Behavior & Evaluation)

Job Description

Yuna’s mission is to radically transform how mental health support is accessed and delivered. We provide immediate, private, 24/7 support through empathetic conversational AI—closing gaps created by long wait times, high costs, and limited access to care. Every role at Yuna directly shapes the experience users have in their most vulnerable moments.

Our first model behavior engineer will be responsible for how our AI conversations behave in the real world. This role goes beyond writing prompts. You will define what “good” looks like, design and run evaluations, diagnose failures across multi-agent systems, and continuously improve the warmth, safety, usefulness, and alignment of Yuna’s conversations at scale.

You will be at the intersection of product, clinical psychology, and engineering. You will own conversational behavior across prompts, context, routing, memory, model choice, and evaluation.

What you’ll do

  • Own conversational behavior across a multi-agent, multi-model conversational system
  • Collaborate with clinicians to design and operate evaluation frameworks for conversational quality, empathy, usefulness, and alignment
  • Diagnose failures by analyzing real conversation traces, agent routing, memory usage, and system context
  • Use product data and qualitative review to prioritize high-impact improvements
  • Evaluate and select models across providers (e.g., OpenAI, Claude, Gemini, Vertex AI) based on task, cost, latency, and qualitative output

What you’ll bring

Required

  • Demonstrated experience owning model or agent behavior in production
  • Hands-on experience designing or operating evaluation systems (qualitative + quantitative)
  • Ability to define what “good” means and defend it with evidence, not vibes
  • Comfort working hands-on with tools like LangGraph, LangSmith, or comparable tracing and evaluation frameworks
  • High data literacy: able to reason from logs, traces, metrics, and analytics to isolate root causes
  • Clear communicator who can collaborate across product, clinical, and engineering teams in ambiguous problem spaces

Nice to have

  • Working knowledge of Python; ability to read, debug, and collaborate within agent logic and evaluation code
  • Familiarity with alignment concepts (e.g. human values, safety tradeoffs, refusal behavior)
  • Proficient with SQL, amplitude, and excel
  • Background in psychology, linguistics, neuroscience, education, or adjacent fields

Location : Remote (can work from anywhere, but must overlap working hours with 8am-12pm PST at a minimum)

Employment Type : Full Time

What We Offer

  • Competitive salary (based on experience) + equity options
  • Remote-first culture with flexibility
  • A fast-growing, talented, and empathetic team dedicated to transforming mental health care
  • An opportunity to use AI for good
  • Level of ownership to make measurable impact, building cutting-edge AI systems that improve lives every day

Job Details

Company
Yuna Health
Location
United Kingdom
Hybrid / Remote Options
Posted