AI Interviewer Tuning: Get Research-Grade Voice Interviews

"AI interviews are shallow." That is the single most common objection from research buyers evaluating AI-native platforms in 2026. It is also a myth — but only if the AI moderator is configured correctly. Out of the box, every AI interviewer (Koji included) gives you a generic experience. Tuned, it produces interviews that pass blind quality reviews against human moderators.

This guide walks through the four levers that move the quality score, how each one works under the hood, and how to combine them for different research goals.

What "Interview Quality" Actually Means in Koji

Every Koji interview is automatically scored on a 1–5 scale (Understanding Quality Scores). The score is not vibes. It is a weighted sum of five dimensions:

Relevance — how on-topic the conversation stayed relative to the research goal in your brief.
Depth — how much insight was generated beyond surface-level answers.
Coverage — what proportion of the brief's key questions and topics were discussed.
Completion — whether the participant reached the natural end of the interview.
Structured quality — for typed questions (scale, single_choice, etc.), whether the AI extracted a high-confidence value.

The quality gate uses this score to decide what counts. Interviews scoring 1–2 are excluded from reports and do not consume credits (How the Quality Gate Works). So pushing your average from 3.2 to 4.1 is not a vanity metric — it directly increases the usable interview yield from the same number of completions, and it cuts wasted credits.

The four levers below are the entire toolbox for moving that score.

Lever 1: Company Context — Make the AI a Domain Expert

Before any study runs, set the account-level company context. This is a one-time configuration that the AI moderator carries into every interview (Company Context: Make Your AI Interviewer a Domain Expert).

Aim for 150–400 words of crisp, factual context covering:

What your company does (one sentence).
Who your customers are.
The key product surfaces and proprietary terminology participants will reference.
Anything sensitive the AI should treat carefully (active incidents, recent pricing changes).

A weak context: "We make project management software for teams." A tuned context names the product, the user types, three core features by their actual names, your top three competitors, and the rough plan structure. The difference shows up immediately in follow-up question relevance — the AI stops asking "what do you mean by that feature?" and starts asking "you mentioned X — how does it compare to your old workflow in Y?"

Quality score impact: a strong company context is worth roughly 0.4–0.6 points on average. Studies that involve specialised vocabulary (medical, legal, dev tools) gain even more because the AI stops asking participants to define their own jargon.

Lever 2: Probing Depth — Tune Per Question

Probing depth is configured per question via the probing.maxFollowUps setting (1 by default; range 0–3). The AI follow-up logic itself is documented in How Koji's AI Follow-Up Probing Works.

Pick the depth based on the question's job, not a global default:

`maxFollowUps`	When to use
`0`	Demographic or screening questions where the answer is the answer. ("How long have you been a customer?")
`1`	The default. Single-choice and scale questions with optional anchoring. ("You said 7 — what would have made it a 9?")
`2`	Open-ended questions about behavior or decisions. ("Tell me about the last time you used X.")
`3`	Generative discovery and JTBD interviews where depth is the entire point.

The biggest tuning mistake is setting probing to 3 on every question. That produces marathon interviews, drops completion rates, and surfaces diminishing-returns insights. The reverse mistake — global 0 to keep interviews short — strips out the qualitative texture that makes the data worth collecting at all.

For scale questions, also set probing.anchor: true. This triggers the "you said X — what would change that?" prompt that turns a numeric rating into a piece of qualitative evidence. Same data point, ten times more useful in a report.

Lever 3: Mix Question Types Deliberately

Koji supports six question types (Structured Questions in AI Interviews): open_ended, scale, single_choice, multiple_choice, ranking, and yes_no. Each one produces a different kind of signal in the report.

The mix that consistently produces the highest-quality interviews:

20–30 percent open_ended — they generate the verbatim quotes that anchor stakeholder reports.
30–40 percent scale + ranking — they produce the chartable distributions that survive in PRDs.
20–30 percent single_choice + multiple_choice — they bucket answers for filter-and-segment work.
0–10 percent yes_no — only when you genuinely need a binary gate (e.g., "Would you accept a 50 percent discount?").

A pure-open_ended interview produces compelling quotes but no chart. A pure-scale interview produces charts but no insight. The mix is the point. Reports that combine both feel quantitative and qualitative at once — which is exactly what stakeholders ask for and exactly what survey tools cannot deliver.

Lever 4: Interview Mode — Match Style to Goal

The interview mode in your research brief controls the AI's overall behavior (Structured, Exploratory, and Hybrid: Choosing the Right Interview Mode):

Structured — AI works through key questions systematically. Optimises for coverage. Best for validation, large-N studies, repeated tracking research.
Exploratory — AI follows participant threads. Optimises for depth. Best for generative discovery, new markets, sensitive topics.
Hybrid (default) — AI works through key questions while following valuable tangents. Balances both.

Mode affects which dimensions of the quality score are emphasised. Structured studies optimise for coverage; exploratory studies optimise for depth. Picking the wrong mode is the most common reason a well-designed study scores below 3.5 across the board — for example, running an exploratory mode on a 200-person validation study leaves coverage gaps everywhere.

When in doubt, hybrid is the right default. The 80/20 rule applies: hybrid produces top-quartile coverage AND top-quartile depth on roughly 80 percent of studies, even if a pure-structured or pure-exploratory mode would have edged it out for a specific use case.

Putting It Together: Three Tuned Configurations

A few starting recipes that combine the four levers:

Generative discovery (e.g., new feature exploration)

Mode: exploratory
Company context: 300+ words including competitor terminology
4 open_ended questions with maxFollowUps: 3
1 ranking question on priorities (maxFollowUps: 1)
Target quality score: 4.2+ on depth dimension

Validation (e.g., pre-launch concept test)

Mode: structured
Company context: 200 words focused on the concept being tested
2 open_ended (maxFollowUps: 2), 4 scale (anchor: true), 1 yes_no
Target quality score: 4.0+ on coverage dimension

Tracking (e.g., quarterly NPS deep-dive)

Mode: hybrid
Company context: stable across quarters
1 scale (anchor: true), 2 open_ended (maxFollowUps: 1), 1 single_choice
Keep configuration identical across waves so trends are comparable

Voice mode works equally well for all three, though setting up voice interviews is worth its own pass — voice tends to surface 20–40 percent more verbatim quotes than text mode at the same probing depth.

The Quality QA Loop

Tuning is not a one-shot exercise. Review interviews scoring below 3 once a week:

Open the lowest-scoring interview from the past 7 days.
Read the rationale field on the quality score — Koji explains why it scored low.
Identify the lever to adjust: was it weak company context, wrong probing depth, missing question types, or wrong mode?
Update the brief and re-publish the study (existing interviews are not retroactively re-scored).

Most teams converge on a stable configuration in 2–3 review cycles, after which the average quality climbs to 4.0+ and stays there. From that point on, the AI moderator is producing research-grade output that holds up against blind comparison with human-moderated transcripts — which is exactly the bar that makes "AI interviews" a serious research method instead of a toy.

Product & Research

Revenue & Growth

Advisory & Services

AI Interviewer Tuning: How to Get Research-Grade Voice Interviews

What "Interview Quality" Actually Means in Koji

Lever 1: Company Context — Make the AI a Domain Expert

Lever 2: Probing Depth — Tune Per Question

Lever 3: Mix Question Types Deliberately

Lever 4: Interview Mode — Match Style to Goal

Putting It Together: Three Tuned Configurations

The Quality QA Loop

Related Resources

Related Articles

Understanding Quality Scores

How the Quality Gate Works

How to Set Up AI Voice Interviews: A Researcher's Complete Guide

AI-Moderated Interviews: How Automated Research Works (And Why It Works Better)

How Koji's AI Follow-Up Probing Works: Going Deeper Than Any Survey

Company Context: How to Make Your AI Interviewer a Domain Expert

Structured, Exploratory, and Hybrid: Choosing the Right Interview Mode in Koji

Structured Questions in AI Interviews