New

Now in Claude, ChatGPT, Cursor & more with our MCP server

Back to docs

How AI Interviewers Work: A Step-by-Step Walkthrough

A clear, no-hype explanation of how an AI interviewer actually works under the hood — from the brief that drives it, to how it decides what to ask next, to the quality score it generates at the end. Includes how Koji's AI interviewer is built.

The Bottom Line

An AI interviewer is a conversational AI system that runs one-on-one customer interviews from start to finish — opening the conversation, asking questions in a coherent order, listening to each answer, deciding when to probe deeper, recognising when a topic is exhausted, moving on, and gracefully closing. It works by combining four moving parts: (1) a research brief that lists priorities and forbidden moves, (2) a conversation state machine that tracks what's been covered, (3) a large language model (LLM) that drafts each next question, and (4) a post-interview analyser that scores the transcript and extracts structured data.

Unlike a survey form, the AI interviewer adapts in real time. If a participant says "I tried three workarounds before I gave up," the AI doesn't move to the next question — it asks which workarounds, in what order, and what made each fail. That single behaviour (adaptive follow-up probing) is what makes an AI interviewer worth using over a Typeform survey.

This article walks through the full lifecycle of an AI-moderated interview, step by step, using Koji's implementation as the reference. By the end, you'll understand exactly what happens between the moment a participant clicks the interview link and the moment a report appears in your dashboard.

Step 1: The Brief Becomes Runtime Instructions

Before the AI ever talks to a participant, it loads a structured research brief. In Koji, the brief is generated by an AI consultant (a separate agent that interviews you about your goal) and contains:

  • A problemStatement and decisionToInform — what the research must help you decide.
  • A targetParticipant profile — who should answer.
  • A methodology framework — Customer Discovery, Jobs-to-be-Done, Mom Test, Concept Testing, or Custom. Each comes with embedded rules (e.g., Mom Test forbids hypothetical pricing questions; JTBD requires past-behaviour anchoring).
  • An ordered list of StudyQuestion objects, each with a type (open_ended, scale, single_choice, multiple_choice, ranking, or yes_no), a priority (required vs optional), and a probing depth (default 1 follow-up, up to 3).
  • Optional context documents — your product one-pager, a competitor matrix, or previous research — that the AI can reference to ask informed questions.

The brief is loaded as the system prompt for the interview session. Importantly, methodology principles are loaded as runtime rules, not just a label. If the brief is "Mom Test", the AI is instructed: Never ask whether the participant would pay $X. Always anchor on what they've done before. Never lead with a feature description. These rules constrain every LLM call during the conversation.

Step 2: Mode Selection — Voice or Text

When the participant clicks the interview link, they're presented with a choice (configurable per study): voice or text.

  • Voice mode uses a real-time speech-to-text model (Deepgram or similar) to transcribe the participant's speech, and a text-to-speech engine to play the AI's replies. Voice interviews typically run 8–15 minutes and yield 3–4x longer answers than text in qualitative research.
  • Text mode is a chat interface — the participant types, the AI replies. Text interviews tend to run 4–8 minutes and have higher completion rates among participants who are at a computer.

Crucially, the AI moderator behavior is identical across modes. The same brief, same methodology rules, same follow-up logic. The difference is the I/O channel.

In Koji, voice costs 3 credits per interview and text costs 1 credit. (Voice is more expensive because of speech-to-text and text-to-speech compute costs.)

Step 3: The Opening — Establishing Context and Consent

The AI opens with a short framing message:

"Hi, thanks for taking 10 minutes to chat. I'm an AI interviewer working with [your company]. Your responses help us understand [the research topic]. We'll cover about [N] topics. There are no right or wrong answers — just curious to hear your experience. Ready when you are."

This is not boilerplate. The opening pulls from your study config: your brand name, the topic from the brief, the question count, and any consent language you've added in the intake form. Koji also collects optional intake fields (name, email, role, custom screener questions) before the conversation starts so the AI can personalise from the first turn.

Step 4: The Main Loop — Ask, Listen, Decide, Probe or Move On

This is the core of the system. For each question in the brief:

  1. Ask the question conversationally. Even for structured types (scale, ranking), the AI phrases it naturally. A 1–5 scale becomes "On a scale of 1 to 5, where 5 is excellent — how would you rate the onboarding experience?" A ranking question becomes "You mentioned three tools. If you had to pick one to keep, which would it be? And the one you'd drop first?" No "click a radio button" prompts.

  2. Listen. Capture the response (full text in chat mode; transcribed text in voice mode).

  3. Evaluate the answer against several criteria:

    • Did the participant actually answer the question? (Sometimes they pivot.)
    • Did they reveal something the brief flagged as a probing opportunity? (e.g., a workaround, a "I wish…" statement, a number worth quantifying)
    • Has the configured probing depth been reached?
    • Is the participant clearly running out of energy?
  4. Decide:

    • Probe — generate a follow-up that digs into the specific detail. Mom Test follow-ups anchor on past behaviour ("When was the last time that happened?"); JTBD follow-ups dig into the trigger event ("What made you start looking for a solution?").
    • Confirm + move on — paraphrase the answer back ("So the main blocker is the data import flow — got it") and ask the next question.
    • Skip — if a participant signals they have nothing to add, the AI moves on rather than forcing it.
  5. Update conversation state. Mark the question as covered, optionally with a confidence score. Note which themes have already emerged so subsequent probes don't repeat them.

This loop runs until either (a) all required questions are covered AND any optional questions worth asking have been asked, or (b) the participant signals they want to stop.

Step 5: Handling Edge Cases

Real interviews have messy moments. A good AI interviewer handles them:

  • Multi-language switching. If the participant replies in Spanish while the brief is in English, the AI matches their language for the rest of the conversation. Koji supports 30+ languages natively.
  • Repetition. If the participant says the same thing in answer to question 4 as they did to question 2, the AI acknowledges it ("You mentioned earlier the import issue — anything else?") rather than asking blindly.
  • Refusal. If the participant declines to answer ("I can't share that"), the AI accepts and moves on without pressing.
  • Tangents. In exploratory mode, the AI follows interesting tangents. In structured mode, it gently redirects. In hybrid mode (Koji's default), required questions are covered first, then tangents are explored.
  • Time pressure. If the participant says they only have 5 minutes left, the AI prioritises required questions and shortens its responses.
  • Misunderstandings. If the AI's question was ambiguous, it rephrases rather than re-asks the same words.

These behaviours come from the methodology framework + a set of generic conversation-handling rules. You don't configure them per question; they're part of the platform.

Step 6: Closing the Interview

When all required (and worthy optional) questions are covered, the AI closes:

"That's everything I wanted to cover. Before we wrap up — is there anything I didn't ask that I should have? Anything you want the team to hear?"

This "open ending" turn often yields the single most valuable answer of the interview, because participants surface concerns they didn't know fit any earlier question. Koji always includes it unless you explicitly disable it.

The AI then thanks the participant, optionally redirects to a thank-you page, and the conversation ends.

Step 7: Post-Interview Analysis

The moment the interview completes, an AI analyst agent runs (in Koji this happens automatically — no manual step):

  1. Quality scoring. Every transcript gets a 1–5 composite score across five dimensions: relevance (did the answers match the questions?), depth (were answers substantive?), coverage (were all required questions actually covered?), completion (did the participant finish?), and structured-answer quality (did scales/rankings parse cleanly?). Conversations scoring 1 or 2 are flagged as low-quality and don't consume credits in Koji.

  2. Structured answer extraction. For each StudyQuestion, the analyser produces a StructuredAnswer containing the structuredValue (the parsed numeric/categorical answer), the qualitativeAnswer (the participant's verbatim quote), and a confidence score. So a scale question yields a number from 1–5; a multiple_choice question yields one or more option IDs.

  3. Theme tagging. Common patterns across the transcript are tagged. As more interviews complete, themes are aggregated and ranked by frequency.

  4. Quote extraction. Memorable, on-point quotes are pulled and attached to the relevant theme.

All of this happens within seconds of the interview ending. You don't wait for a batch job.

Step 8: Real-Time Aggregation Across Interviews

As each interview completes, Koji updates the dashboard and insights report in real time. After 5 interviews, you can already see emerging themes. After 30, the dashboard shows distribution charts for every structured question, theme frequency, sentiment, and a stack of customer quotes per theme.

This matters because traditional research has a long synthesis tail — you finish interviews on Monday, get insights on Friday. With AI interviewers and real-time analysis, you see patterns while the study is still running and can decide whether to keep recruiting or stop.

What Makes a Good AI Interviewer (vs. a Bad One)

Not all AI interviewers are built the same. A good one:

  • Asks follow-ups based on what the participant said, not on a pre-written script.
  • Knows when to stop probing — too many follow-ups feel like interrogation.
  • Mixes structured and open-ended questions without breaking conversational flow.
  • Avoids leading questions"Would you say this is frustrating?" is a Mom Test violation.
  • Respects participant constraints — language, time, refusal to answer.
  • Produces machine-readable outputs so the rest of the stack can build on them.

A bad one:

  • Reads questions verbatim regardless of context.
  • Treats every answer the same way (no quality differentiation).
  • Asks "click a button" prompts in voice mode.
  • Has no concept of methodology — just generic chat.
  • Outputs only free text, with no structured extraction.

Koji is built around the "good" list. The probing-depth setting, methodology framework, conversation modes (structured/exploratory/hybrid), and post-interview structured extraction are all explicit design decisions.

Real-World Performance: What to Expect

In benchmark studies of AI vs human-moderated interviews:

  • Completion rate. AI interviews complete at ~70–85% rate (text) and ~60–75% (voice) — comparable to async human-moderated research.
  • Average length. Voice: 8–15 minutes. Text: 4–8 minutes.
  • Answer richness. Voice answers average 3–4x longer than text answers in qualitative research (academic studies of voice surveys).
  • Insight surfacing. With real-time analysis, teams typically see the first emerging theme within 5–10 completed interviews.
  • Time to publishable report. A 30-interview study can go from launch to shareable report in 3–7 days, vs. 3–6 weeks for traditional moderated research.

These numbers vary by study design, but the order-of-magnitude difference vs. traditional research is real.

Frequently Asked Questions

How does an AI interviewer know what to ask next? It evaluates the participant's last answer against the research brief: which question is next, has the current question's probing depth been used, did the participant reveal something worth following up on, and does the methodology suggest a specific probe (e.g., past-behaviour anchor in Mom Test). The LLM then drafts the next utterance under those constraints.

Can the AI interviewer ask structured questions like rankings or scales? Yes. Koji supports six structured question types — open_ended, scale, single_choice, multiple_choice, ranking, and yes_no — asked conversationally but stored as chartable structured values. So a 1–5 scale question is asked naturally and the answer is parsed into a number you can chart.

What if the participant goes off-topic? The AI's behaviour depends on the interview mode. In structured mode it gently redirects back to required questions. In exploratory mode it follows interesting tangents. In hybrid mode (the default), required questions are covered first and tangents explored after. You can change the mode per study.

How does the AI handle a confused or disengaged participant? It detects short, terse, or disengaged answers (e.g., "idk", repeated "yes" to open questions) and adapts — sometimes rephrasing the question, sometimes shortening the interview, sometimes ending gracefully. Low-quality conversations are flagged in the quality score and don't consume credits in Koji.

Is the AI interviewer biased? Any LLM-based system carries some bias. Koji mitigates this by embedding methodology rules (Mom Test forbids leading questions; JTBD avoids feature questions), exposing every transcript transparently, and letting you flag transcripts for exclusion from the final report.

Can participants tell they're talking to an AI? Yes — Koji explicitly identifies the moderator as an AI in the opening turn. Studies show participants are more candid with AI moderators on sensitive topics (compensation, dissatisfaction with employers, dropped subscriptions) precisely because they know it's not a human.

Related Resources