New

Now in Claude, ChatGPT, Cursor & more with our MCP server

Back to docs
Reports & Analysis

AI Interviewer Tuning: How to Get Research-Grade Voice Interviews

A complete playbook for tuning Koji's AI interviewer — company context, probing depth, structured questions, and interview mode — to deliver interviews indistinguishable from a human researcher.

"AI interviews are shallow." That is the single most common objection from research buyers evaluating AI-native platforms in 2026. It is also a myth — but only if the AI moderator is configured correctly. Out of the box, every AI interviewer (Koji included) gives you a generic experience. Tuned, it produces interviews that pass blind quality reviews against human moderators.

This guide walks through the four levers that move the quality score, how each one works under the hood, and how to combine them for different research goals.

What "Interview Quality" Actually Means in Koji

Every Koji interview is automatically scored on a 1–5 scale (Understanding Quality Scores). The score is not vibes. It is a weighted sum of five dimensions:

  • Relevance — how on-topic the conversation stayed relative to the research goal in your brief.
  • Depth — how much insight was generated beyond surface-level answers.
  • Coverage — what proportion of the brief's key questions and topics were discussed.
  • Completion — whether the participant reached the natural end of the interview.
  • Structured quality — for typed questions (scale, single_choice, etc.), whether the AI extracted a high-confidence value.

The quality gate uses this score to decide what counts. Interviews scoring 1–2 are excluded from reports and do not consume credits (How the Quality Gate Works). So pushing your average from 3.2 to 4.1 is not a vanity metric — it directly increases the usable interview yield from the same number of completions, and it cuts wasted credits.

The four levers below are the entire toolbox for moving that score.

Lever 1: Company Context — Make the AI a Domain Expert

Before any study runs, set the account-level company context. This is a one-time configuration that the AI moderator carries into every interview (Company Context: Make Your AI Interviewer a Domain Expert).

Aim for 150–400 words of crisp, factual context covering:

  • What your company does (one sentence).
  • Who your customers are.
  • The key product surfaces and proprietary terminology participants will reference.
  • Anything sensitive the AI should treat carefully (active incidents, recent pricing changes).

A weak context: "We make project management software for teams." A tuned context names the product, the user types, three core features by their actual names, your top three competitors, and the rough plan structure. The difference shows up immediately in follow-up question relevance — the AI stops asking "what do you mean by that feature?" and starts asking "you mentioned X — how does it compare to your old workflow in Y?"

Quality score impact: a strong company context is worth roughly 0.4–0.6 points on average. Studies that involve specialised vocabulary (medical, legal, dev tools) gain even more because the AI stops asking participants to define their own jargon.

Lever 2: Probing Depth — Tune Per Question

Probing depth is configured per question via the probing.maxFollowUps setting (1 by default; range 0–3). The AI follow-up logic itself is documented in How Koji's AI Follow-Up Probing Works.

Pick the depth based on the question's job, not a global default:

maxFollowUpsWhen to use
0Demographic or screening questions where the answer is the answer. ("How long have you been a customer?")
1The default. Single-choice and scale questions with optional anchoring. ("You said 7 — what would have made it a 9?")
2Open-ended questions about behavior or decisions. ("Tell me about the last time you used X.")
3Generative discovery and JTBD interviews where depth is the entire point.

The biggest tuning mistake is setting probing to 3 on every question. That produces marathon interviews, drops completion rates, and surfaces diminishing-returns insights. The reverse mistake — global 0 to keep interviews short — strips out the qualitative texture that makes the data worth collecting at all.

For scale questions, also set probing.anchor: true. This triggers the "you said X — what would change that?" prompt that turns a numeric rating into a piece of qualitative evidence. Same data point, ten times more useful in a report.

Lever 3: Mix Question Types Deliberately

Koji supports six question types (Structured Questions in AI Interviews): open_ended, scale, single_choice, multiple_choice, ranking, and yes_no. Each one produces a different kind of signal in the report.

The mix that consistently produces the highest-quality interviews:

  • 20–30 percent open_ended — they generate the verbatim quotes that anchor stakeholder reports.
  • 30–40 percent scale + ranking — they produce the chartable distributions that survive in PRDs.
  • 20–30 percent single_choice + multiple_choice — they bucket answers for filter-and-segment work.
  • 0–10 percent yes_no — only when you genuinely need a binary gate (e.g., "Would you accept a 50 percent discount?").

A pure-open_ended interview produces compelling quotes but no chart. A pure-scale interview produces charts but no insight. The mix is the point. Reports that combine both feel quantitative and qualitative at once — which is exactly what stakeholders ask for and exactly what survey tools cannot deliver.

Lever 4: Interview Mode — Match Style to Goal

The interview mode in your research brief controls the AI's overall behavior (Structured, Exploratory, and Hybrid: Choosing the Right Interview Mode):

  • Structured — AI works through key questions systematically. Optimises for coverage. Best for validation, large-N studies, repeated tracking research.
  • Exploratory — AI follows participant threads. Optimises for depth. Best for generative discovery, new markets, sensitive topics.
  • Hybrid (default) — AI works through key questions while following valuable tangents. Balances both.

Mode affects which dimensions of the quality score are emphasised. Structured studies optimise for coverage; exploratory studies optimise for depth. Picking the wrong mode is the most common reason a well-designed study scores below 3.5 across the board — for example, running an exploratory mode on a 200-person validation study leaves coverage gaps everywhere.

When in doubt, hybrid is the right default. The 80/20 rule applies: hybrid produces top-quartile coverage AND top-quartile depth on roughly 80 percent of studies, even if a pure-structured or pure-exploratory mode would have edged it out for a specific use case.

Putting It Together: Three Tuned Configurations

A few starting recipes that combine the four levers:

Generative discovery (e.g., new feature exploration)

  • Mode: exploratory
  • Company context: 300+ words including competitor terminology
  • 4 open_ended questions with maxFollowUps: 3
  • 1 ranking question on priorities (maxFollowUps: 1)
  • Target quality score: 4.2+ on depth dimension

Validation (e.g., pre-launch concept test)

  • Mode: structured
  • Company context: 200 words focused on the concept being tested
  • 2 open_ended (maxFollowUps: 2), 4 scale (anchor: true), 1 yes_no
  • Target quality score: 4.0+ on coverage dimension

Tracking (e.g., quarterly NPS deep-dive)

  • Mode: hybrid
  • Company context: stable across quarters
  • 1 scale (anchor: true), 2 open_ended (maxFollowUps: 1), 1 single_choice
  • Keep configuration identical across waves so trends are comparable

Voice mode works equally well for all three, though setting up voice interviews is worth its own pass — voice tends to surface 20–40 percent more verbatim quotes than text mode at the same probing depth.

The Quality QA Loop

Tuning is not a one-shot exercise. Review interviews scoring below 3 once a week:

  1. Open the lowest-scoring interview from the past 7 days.
  2. Read the rationale field on the quality score — Koji explains why it scored low.
  3. Identify the lever to adjust: was it weak company context, wrong probing depth, missing question types, or wrong mode?
  4. Update the brief and re-publish the study (existing interviews are not retroactively re-scored).

Most teams converge on a stable configuration in 2–3 review cycles, after which the average quality climbs to 4.0+ and stays there. From that point on, the AI moderator is producing research-grade output that holds up against blind comparison with human-moderated transcripts — which is exactly the bar that makes "AI interviews" a serious research method instead of a toy.

Related Resources

Related Articles

Understanding Quality Scores

Learn how Koji evaluates interview quality on a 0-5 scale and why it matters for your research and billing.

How the Quality Gate Works

Understand Koji's quality gate — conversations scoring below 3/5 are completely free and don't consume credits, protecting your research budget.

How to Set Up AI Voice Interviews: A Researcher's Complete Guide

Step-by-step guide to configuring, testing, and optimizing voice interview studies in Koji — from research brief to launch.

AI-Moderated Interviews: How Automated Research Works (And Why It Works Better)

Understand how AI-moderated interviews work, when to use them over human-moderated sessions, and how to get the most from automated qualitative research.

How Koji's AI Follow-Up Probing Works: Going Deeper Than Any Survey

Understand how Koji's AI interviewer automatically asks follow-up questions to go deeper on every answer — and how to configure probing depth, custom instructions, and anchor behavior for scale questions.

Company Context: How to Make Your AI Interviewer a Domain Expert

Learn how to configure Koji's company context setting so your AI interviewer asks sharper, more relevant follow-up questions across every study you run.

Structured, Exploratory, and Hybrid: Choosing the Right Interview Mode in Koji

A complete guide to Koji's three interview modes — structured, exploratory, and hybrid — and when to use each for your research goals.

Structured Questions in AI Interviews

Mix quantitative data collection — scales, ratings, multiple choice, ranking — with AI-powered conversational follow-up in a single interview.