{"site":{"name":"Koji","description":"AI-native customer research platform that helps teams conduct, analyze, and synthesize customer interviews at scale.","url":"https://www.koji.so","contentTypes":["blog","documentation"],"lastUpdated":"2026-06-25T01:19:52.557Z"},"content":[{"type":"documentation","id":"dfb1d90e-f3f1-4543-9bc5-7790992f956f","slug":"ai-interviewer-tuning-guide","title":"AI Interviewer Tuning: How to Get Research-Grade Voice Interviews","url":"https://www.koji.so/docs/ai-interviewer-tuning-guide","summary":"Koji interview quality is a 1–5 score across five dimensions (relevance, depth, coverage, completion, structured quality) and is moved by exactly four levers: (1) Company Context — 150–400 words of factual product/customer/terminology context worth 0.4–0.6 points on average; (2) Per-question probing depth (probing.maxFollowUps, 0–3) chosen by question job, with anchor:true on scale questions; (3) A deliberate mix of the six structured question types — roughly 20–30% open_ended, 30–40% scale+ranking, 20–30% choice, 0–10% yes_no; (4) Interview mode (structured/exploratory/hybrid) matched to research goal. Three reference configurations are provided for generative discovery, validation, and tracking studies, plus a weekly quality-QA loop that converges most teams to 4.0+ averages within 2–3 review cycles.","content":"\n\"AI interviews are shallow.\" That is the single most common objection from research buyers evaluating AI-native platforms in 2026. It is also a myth — but only if the AI moderator is configured correctly. Out of the box, every AI interviewer (Koji included) gives you a generic experience. Tuned, it produces interviews that pass blind quality reviews against human moderators.\n\nThis guide walks through the four levers that move the quality score, how each one works under the hood, and how to combine them for different research goals.\n\n## What \"Interview Quality\" Actually Means in Koji\n\nEvery Koji interview is automatically scored on a 1–5 scale ([Understanding Quality Scores](/docs/understanding-quality-scores)). The score is not vibes. It is a weighted sum of five dimensions:\n\n- **Relevance** — how on-topic the conversation stayed relative to the research goal in your brief.\n- **Depth** — how much insight was generated beyond surface-level answers.\n- **Coverage** — what proportion of the brief's key questions and topics were discussed.\n- **Completion** — whether the participant reached the natural end of the interview.\n- **Structured quality** — for typed questions (`scale`, `single_choice`, etc.), whether the AI extracted a high-confidence value.\n\nThe quality gate uses this score to decide what counts. Interviews scoring 1–2 are excluded from reports and do not consume credits ([How the Quality Gate Works](/docs/how-the-quality-gate-works)). So pushing your average from 3.2 to 4.1 is not a vanity metric — it directly increases the usable interview yield from the same number of completions, and it cuts wasted credits.\n\nThe four levers below are the entire toolbox for moving that score.\n\n## Lever 1: Company Context — Make the AI a Domain Expert\n\nBefore any study runs, set the account-level company context. This is a one-time configuration that the AI moderator carries into every interview ([Company Context: Make Your AI Interviewer a Domain Expert](/docs/company-context-guide)).\n\nAim for 150–400 words of crisp, factual context covering:\n\n- What your company does (one sentence).\n- Who your customers are.\n- The key product surfaces and proprietary terminology participants will reference.\n- Anything sensitive the AI should treat carefully (active incidents, recent pricing changes).\n\nA weak context: \"We make project management software for teams.\" A tuned context names the product, the user types, three core features by their actual names, your top three competitors, and the rough plan structure. The difference shows up immediately in follow-up question relevance — the AI stops asking \"what do you mean by that feature?\" and starts asking \"you mentioned X — how does it compare to your old workflow in Y?\"\n\nQuality score impact: a strong company context is worth roughly 0.4–0.6 points on average. Studies that involve specialised vocabulary (medical, legal, dev tools) gain even more because the AI stops asking participants to define their own jargon.\n\n## Lever 2: Probing Depth — Tune Per Question\n\nProbing depth is configured per question via the `probing.maxFollowUps` setting (1 by default; range 0–3). The AI follow-up logic itself is documented in [How Koji's AI Follow-Up Probing Works](/docs/ai-probing-guide).\n\nPick the depth based on the question's job, not a global default:\n\n| `maxFollowUps` | When to use |\n| --- | --- |\n| `0` | Demographic or screening questions where the answer is the answer. (\"How long have you been a customer?\") |\n| `1` | The default. Single-choice and scale questions with optional anchoring. (\"You said 7 — what would have made it a 9?\") |\n| `2` | Open-ended questions about behavior or decisions. (\"Tell me about the last time you used X.\") |\n| `3` | Generative discovery and JTBD interviews where depth is the entire point. |\n\nThe biggest tuning mistake is setting probing to 3 on every question. That produces marathon interviews, drops completion rates, and surfaces diminishing-returns insights. The reverse mistake — global `0` to keep interviews short — strips out the qualitative texture that makes the data worth collecting at all.\n\nFor scale questions, also set `probing.anchor: true`. This triggers the \"you said X — what would change that?\" prompt that turns a numeric rating into a piece of qualitative evidence. Same data point, ten times more useful in a report.\n\n## Lever 3: Mix Question Types Deliberately\n\nKoji supports six question types ([Structured Questions in AI Interviews](/docs/structured-questions-guide)): `open_ended`, `scale`, `single_choice`, `multiple_choice`, `ranking`, and `yes_no`. Each one produces a different kind of signal in the report.\n\nThe mix that consistently produces the highest-quality interviews:\n\n- **20–30 percent open_ended** — they generate the verbatim quotes that anchor stakeholder reports.\n- **30–40 percent scale + ranking** — they produce the chartable distributions that survive in PRDs.\n- **20–30 percent single_choice + multiple_choice** — they bucket answers for filter-and-segment work.\n- **0–10 percent yes_no** — only when you genuinely need a binary gate (e.g., \"Would you accept a 50 percent discount?\").\n\nA pure-open_ended interview produces compelling quotes but no chart. A pure-scale interview produces charts but no insight. The mix is the point. Reports that combine both feel quantitative and qualitative at once — which is exactly what stakeholders ask for and exactly what survey tools cannot deliver.\n\n## Lever 4: Interview Mode — Match Style to Goal\n\nThe interview mode in your research brief controls the AI's overall behavior ([Structured, Exploratory, and Hybrid: Choosing the Right Interview Mode](/docs/interview-mode-guide)):\n\n- **Structured** — AI works through key questions systematically. Optimises for coverage. Best for validation, large-N studies, repeated tracking research.\n- **Exploratory** — AI follows participant threads. Optimises for depth. Best for generative discovery, new markets, sensitive topics.\n- **Hybrid** (default) — AI works through key questions while following valuable tangents. Balances both.\n\nMode affects which dimensions of the quality score are emphasised. Structured studies optimise for coverage; exploratory studies optimise for depth. Picking the wrong mode is the most common reason a well-designed study scores below 3.5 across the board — for example, running an exploratory mode on a 200-person validation study leaves coverage gaps everywhere.\n\nWhen in doubt, hybrid is the right default. The 80/20 rule applies: hybrid produces top-quartile coverage AND top-quartile depth on roughly 80 percent of studies, even if a pure-structured or pure-exploratory mode would have edged it out for a specific use case.\n\n## Putting It Together: Three Tuned Configurations\n\nA few starting recipes that combine the four levers:\n\n**Generative discovery (e.g., new feature exploration)**\n\n- Mode: exploratory\n- Company context: 300+ words including competitor terminology\n- 4 open_ended questions with `maxFollowUps: 3`\n- 1 ranking question on priorities (`maxFollowUps: 1`)\n- Target quality score: 4.2+ on depth dimension\n\n**Validation (e.g., pre-launch concept test)**\n\n- Mode: structured\n- Company context: 200 words focused on the concept being tested\n- 2 open_ended (`maxFollowUps: 2`), 4 scale (`anchor: true`), 1 yes_no\n- Target quality score: 4.0+ on coverage dimension\n\n**Tracking (e.g., quarterly NPS deep-dive)**\n\n- Mode: hybrid\n- Company context: stable across quarters\n- 1 scale (`anchor: true`), 2 open_ended (`maxFollowUps: 1`), 1 single_choice\n- Keep configuration identical across waves so trends are comparable\n\nVoice mode works equally well for all three, though [setting up voice interviews](/docs/setting-up-voice-interviews) is worth its own pass — voice tends to surface 20–40 percent more verbatim quotes than text mode at the same probing depth.\n\n## The Quality QA Loop\n\nTuning is not a one-shot exercise. Review interviews scoring below 3 once a week:\n\n1. Open the lowest-scoring interview from the past 7 days.\n2. Read the rationale field on the quality score — Koji explains why it scored low.\n3. Identify the lever to adjust: was it weak company context, wrong probing depth, missing question types, or wrong mode?\n4. Update the brief and re-publish the study (existing interviews are not retroactively re-scored).\n\nMost teams converge on a stable configuration in 2–3 review cycles, after which the average quality climbs to 4.0+ and stays there. From that point on, the AI moderator is producing research-grade output that holds up against blind comparison with human-moderated transcripts — which is exactly the bar that makes \"AI interviews\" a serious research method instead of a toy.\n\n## Related Resources\n\n- [How Koji's AI Follow-Up Probing Works](/docs/ai-probing-guide)\n- [Company Context: Make Your AI Interviewer a Domain Expert](/docs/company-context-guide)\n- [Structured, Exploratory, and Hybrid: Choosing the Right Interview Mode](/docs/interview-mode-guide)\n- [Understanding Quality Scores](/docs/understanding-quality-scores)\n- [Structured Questions in AI Interviews](/docs/structured-questions-guide)\n- [How to Set Up AI Voice Interviews](/docs/setting-up-voice-interviews)\n- [AI-Moderated Interviews: How Automated Research Works](/docs/ai-moderated-interviews)\n\n\n## Further reading on the blog\n\n- [Can I Paste User Interviews into ChatGPT? A Guide to GDPR and LLMs](/blog/can-i-paste-user-interviews-into-chatgpt-a-guide-to-gdpr-and-llms) — Every product manager wants to ask an LLM about their user feedback. But pasting customer transcripts into public models is a GDPR nightmare\n- [How to Run Customer Exit Interviews: The Complete Guide (2026)](/blog/customer-exit-interviews-guide-2026) — Customer exit interviews reveal the real reasons customers churn — not the polished answer they gave on your cancellation form. Here is how \n- [Google Forms to AI Interviews: A Complete Migration Guide](/blog/google-forms-to-ai-interviews) — Why teams are moving beyond Google Forms and how to convert your existing forms into AI-powered research conversations in 30 seconds.\n\n<!-- further-reading:blog -->\n","category":"Reports & Analysis","lastModified":"2026-05-17T03:21:34.303214+00:00","metaTitle":"AI Interviewer Tuning: Get Research-Grade Voice Interviews","metaDescription":"Tune the Koji AI interviewer with company context, probing depth, structured questions, and interview mode to deliver research-grade quality scores.","keywords":["ai interviewer tuning","ai moderator quality","voice interview best practices","ai interview probing depth","interview quality score","ai research interviewer prompt","ai moderator configuration"],"aiSummary":"Koji interview quality is a 1–5 score across five dimensions (relevance, depth, coverage, completion, structured quality) and is moved by exactly four levers: (1) Company Context — 150–400 words of factual product/customer/terminology context worth 0.4–0.6 points on average; (2) Per-question probing depth (probing.maxFollowUps, 0–3) chosen by question job, with anchor:true on scale questions; (3) A deliberate mix of the six structured question types — roughly 20–30% open_ended, 30–40% scale+ranking, 20–30% choice, 0–10% yes_no; (4) Interview mode (structured/exploratory/hybrid) matched to research goal. Three reference configurations are provided for generative discovery, validation, and tracking studies, plus a weekly quality-QA loop that converges most teams to 4.0+ averages within 2–3 review cycles.","aiPrerequisites":["Familiarity with the Koji research brief and study editor","Existing studies with at least 10 completed interviews to review","Access to your account-level company context settings"],"aiLearningOutcomes":["Understand exactly how the Koji quality score is computed","Tune company context for 0.4–0.6 quality points of upside","Pick probing.maxFollowUps per question based on its research job","Combine the 6 question types in the right ratios for stakeholder-ready reports","Choose between structured, exploratory, and hybrid modes","Run a weekly quality QA loop that converges to 4.0+ averages"],"aiDifficulty":"intermediate","aiEstimatedTime":"12 min read"}],"pagination":{"total":1,"returned":1,"offset":0}}