{"site":{"name":"Koji","description":"AI-native customer research platform that helps teams conduct, analyze, and synthesize customer interviews at scale.","url":"https://www.koji.so","contentTypes":["blog","documentation"],"lastUpdated":"2026-06-17T04:23:40.240Z"},"content":[{"type":"documentation","id":"525dbb9a-ac6a-4532-a94d-a1356d517f00","slug":"moderated-usability-testing-guide","title":"Moderated Usability Testing: How to Run Sessions That Surface Real Problems (2026 Guide)","url":"https://www.koji.so/docs/moderated-usability-testing-guide","summary":"A practical reference on moderated usability testing: what it is, when to choose it over unmoderated, how to write task scenarios, run think-aloud sessions, measure task success/SEQ/SUS, choose sample size, avoid common mistakes, and scale moderation with Koji's AI interviewer.","content":"## What is moderated usability testing? (Answer first)\n\n**Moderated usability testing is a research method in which a facilitator guides a participant through realistic tasks on a product — live — while observing where they hesitate, struggle, or fail.** The moderator can ask \"why did you do that?\" in the moment, probe confusion as it happens, and adapt the session to what the participant reveals. That real-time probing is exactly what distinguishes moderation from a static survey or a hands-off recording: you do not just see *that* someone failed a task, you learn *why*.\n\nThe trade-off has always been cost. A traditional moderated study means scheduling 8–15 calls, sitting through every one, taking notes, and then spending days synthesizing recordings. Platforms like Koji change that economics: an AI moderator runs the think-aloud session, asks adaptive follow-up questions, and clusters the findings automatically — so you get moderated-quality depth at unmoderated-style scale.\n\n> **Bottom line:** Use moderated usability testing when you need to understand the *reasoning* behind behavior — early-stage designs, complex flows, or any time a metric alone will not tell you what to fix. Use unmoderated testing when you only need to confirm a known hypothesis at volume.\n\n## Moderated vs. unmoderated: when to choose which\n\n| Dimension | Moderated | Unmoderated |\n|---|---|---|\n| Depth of insight | High — probe the \"why\" live | Lower — behavior only |\n| Best for | New flows, ambiguous problems, B2B/expert users | Validated flows, A/B comparisons, large samples |\n| Speed per session | Slower (live) | Faster (self-serve) |\n| Cost to scale | Traditionally high | Low |\n\nThe historical rule was \"moderate for discovery, go unmoderated for validation.\" Koji collapses that divide: its AI moderator conducts a guided, probing session *and* runs many of them in parallel, so you no longer have to trade depth for sample size. For a deeper comparison, see [Unmoderated vs Moderated User Research](/docs/unmoderated-vs-moderated-research).\n\n## How to run a moderated usability test (step by step)\n\n**1. Define the research question, not the feature.** Write down what decision the test will inform. \"Can a new user complete checkout without help?\" is testable; \"Is the design good?\" is not.\n\n**2. Write task scenarios, not instructions.** A good task gives context and a goal but never names the UI element. Bad: \"Click the blue Filter button.\" Good: \"You want to find a jacket under €100 in your size — show me how you would do that.\" Naming the button tells the participant the answer and destroys the test.\n\n**3. Recruit the right participants.** Five users will surface roughly 85% of the usability problems in a single design (Nielsen Norman Group), which is why 5–8 participants per distinct user segment is the workhorse sample size for formative moderated tests. Add a screener so you talk to real target users, not whoever is available.\n\n**4. Run a think-aloud session.** Ask the participant to narrate their thoughts continuously: \"Tell me what you are looking at, what you expect to happen, and what you are trying to do.\" Stay quiet while they work. Resist the urge to help — a silence that feels painful to you is data.\n\n**5. Probe at the right moments.** When someone hesitates, hovers, or backtracks, that is your cue to ask a non-leading follow-up: \"What did you expect to happen there?\" or \"What are you looking for right now?\" This adaptive probing is the entire value of moderation — and it is exactly what Koji's AI interviewer automates with configurable follow-up depth (1–3 probes per question).\n\n**6. Capture both behavior and metrics.** Note task success/failure, where errors cluster, and the verbatim quotes that explain them.\n\n## The metrics that make moderated tests defensible\n\nQualitative observation is the heart of moderated testing, but pairing it with a few standard metrics makes findings far easier to defend to stakeholders:\n\n- **Task success rate** — % of participants who complete each task. The single most important usability metric.\n- **Time on task** — how long completion takes; spikes flag friction.\n- **Single Ease Question (SEQ)** — a 7-point post-task rating of difficulty. See the [Single Ease Question (SEQ) guide](/docs/single-ease-question-seq-guide).\n- **System Usability Scale (SUS)** — a validated 0–100 score for the whole experience. See the [System Usability Scale (SUS) guide](/docs/system-usability-scale-guide).\n\nIn Koji, you capture these with **structured questions** — Koji supports six types (open_ended, scale, single_choice, multiple_choice, ranking, yes_no). You add SEQ as a `scale` question and the post-task \"what was confusing?\" as an `open_ended` question with AI probing. Because every scale answer is captured as a ground-truth structured value, Koji aggregates task-level difficulty automatically while still clustering the open-ended explanations into themed friction findings.\n\n## Common mistakes that ruin moderated sessions\n\n- **Leading the witness.** \"Was that easy?\" invites a polite yes. Ask \"How did that feel?\" instead. See [How to Avoid Leading Questions](/docs/avoiding-leading-questions).\n- **Helping too soon.** The moment you rescue a struggling user, you lose the finding.\n- **Testing the participant, not the product.** If someone fails, the design failed — never imply otherwise, or social-desirability bias will distort everything that follows.\n- **Skipping the pilot.** Always run one practice session to catch broken tasks before they cost you real participants.\n- **Synthesizing from memory.** Notes taken during a live call are lossy; a verbatim transcript with coded themes is not.\n\n## How Koji makes moderated usability testing faster\n\nTraditional moderation is bottlenecked by *you* — one researcher can only sit in so many calls. Koji removes that bottleneck without removing the depth:\n\n1. **AI moderator runs the think-aloud session** in voice or text, asking your tasks and probing hesitation with adaptive follow-ups — no scheduler, no calendar, available 24/7.\n2. **Voice mode** captures natural think-aloud narration; text mode renders interactive widgets for SEQ and choice questions.\n3. **Automatic analysis** transcribes every session, codes open-ended answers into themes, and aggregates task-level metrics into a **real-time report** you can share with one link.\n4. **Scale without losing nuance** — run 5 sessions or 50 in parallel; the per-question synthesis holds either way.\n\nA study that used to take two weeks of scheduling, moderating, and synthesizing becomes an afternoon. You bring the tasks and the judgment; Koji handles the moderation and the math.\n\n## Related Resources\n\n- [Structured Questions in AI Interviews](/docs/structured-questions-guide) — the six question types that power task metrics and probes\n- [Unmoderated vs Moderated User Research: How to Choose](/docs/unmoderated-vs-moderated-research)\n- [How to Conduct Usability Testing: The Complete Guide](/docs/usability-testing-guide)\n- [Usability Testing Script Template](/docs/usability-testing-script-template)\n- [Single Ease Question (SEQ): The 7-Point UX Metric](/docs/single-ease-question-seq-guide)\n- [System Usability Scale (SUS): Complete Guide](/docs/system-usability-scale-guide)\n","category":"Interview Techniques","lastModified":"2026-06-16T03:16:20.911398+00:00","metaTitle":"Moderated Usability Testing: Tasks, Think-Aloud & Metrics (2026 Guide)","metaDescription":"How to run moderated usability testing in 2026: write non-leading tasks, run think-aloud sessions, measure task success and SEQ, pick sample size, and scale moderation with AI on Koji.","keywords":["moderated usability testing","usability testing","think aloud testing","task success rate","moderated vs unmoderated","usability test tasks","user testing sessions","SEQ","remote usability testing"],"aiSummary":"A practical reference on moderated usability testing: what it is, when to choose it over unmoderated, how to write task scenarios, run think-aloud sessions, measure task success/SEQ/SUS, choose sample size, avoid common mistakes, and scale moderation with Koji's AI interviewer.","aiPrerequisites":["A product or prototype to test","Basic familiarity with usability concepts","Access to target users for recruiting"],"aiLearningOutcomes":["Decide when moderated testing beats unmoderated","Write task scenarios that do not lead participants","Run a think-aloud session and probe at the right moments","Capture task success, time on task, and SEQ correctly","Scale moderated sessions with an AI moderator on Koji"],"aiDifficulty":"beginner","aiEstimatedTime":"11 min read"}],"pagination":{"total":1,"returned":1,"offset":0}}