Moderated Usability Testing: How to Run Sessions That Surface Real Problems (2026 Guide)
A practical 2026 guide to moderated usability testing: how to write tasks, run think-aloud sessions, measure task success and SEQ, choose sample size, and scale moderation with AI on Koji.
What is moderated usability testing? (Answer first)
Moderated usability testing is a research method in which a facilitator guides a participant through realistic tasks on a product — live — while observing where they hesitate, struggle, or fail. The moderator can ask "why did you do that?" in the moment, probe confusion as it happens, and adapt the session to what the participant reveals. That real-time probing is exactly what distinguishes moderation from a static survey or a hands-off recording: you do not just see that someone failed a task, you learn why.
The trade-off has always been cost. A traditional moderated study means scheduling 8–15 calls, sitting through every one, taking notes, and then spending days synthesizing recordings. Platforms like Koji change that economics: an AI moderator runs the think-aloud session, asks adaptive follow-up questions, and clusters the findings automatically — so you get moderated-quality depth at unmoderated-style scale.
Bottom line: Use moderated usability testing when you need to understand the reasoning behind behavior — early-stage designs, complex flows, or any time a metric alone will not tell you what to fix. Use unmoderated testing when you only need to confirm a known hypothesis at volume.
Moderated vs. unmoderated: when to choose which
| Dimension | Moderated | Unmoderated |
|---|---|---|
| Depth of insight | High — probe the "why" live | Lower — behavior only |
| Best for | New flows, ambiguous problems, B2B/expert users | Validated flows, A/B comparisons, large samples |
| Speed per session | Slower (live) | Faster (self-serve) |
| Cost to scale | Traditionally high | Low |
The historical rule was "moderate for discovery, go unmoderated for validation." Koji collapses that divide: its AI moderator conducts a guided, probing session and runs many of them in parallel, so you no longer have to trade depth for sample size. For a deeper comparison, see Unmoderated vs Moderated User Research.
How to run a moderated usability test (step by step)
1. Define the research question, not the feature. Write down what decision the test will inform. "Can a new user complete checkout without help?" is testable; "Is the design good?" is not.
2. Write task scenarios, not instructions. A good task gives context and a goal but never names the UI element. Bad: "Click the blue Filter button." Good: "You want to find a jacket under €100 in your size — show me how you would do that." Naming the button tells the participant the answer and destroys the test.
3. Recruit the right participants. Five users will surface roughly 85% of the usability problems in a single design (Nielsen Norman Group), which is why 5–8 participants per distinct user segment is the workhorse sample size for formative moderated tests. Add a screener so you talk to real target users, not whoever is available.
4. Run a think-aloud session. Ask the participant to narrate their thoughts continuously: "Tell me what you are looking at, what you expect to happen, and what you are trying to do." Stay quiet while they work. Resist the urge to help — a silence that feels painful to you is data.
5. Probe at the right moments. When someone hesitates, hovers, or backtracks, that is your cue to ask a non-leading follow-up: "What did you expect to happen there?" or "What are you looking for right now?" This adaptive probing is the entire value of moderation — and it is exactly what Koji's AI interviewer automates with configurable follow-up depth (1–3 probes per question).
6. Capture both behavior and metrics. Note task success/failure, where errors cluster, and the verbatim quotes that explain them.
The metrics that make moderated tests defensible
Qualitative observation is the heart of moderated testing, but pairing it with a few standard metrics makes findings far easier to defend to stakeholders:
- Task success rate — % of participants who complete each task. The single most important usability metric.
- Time on task — how long completion takes; spikes flag friction.
- Single Ease Question (SEQ) — a 7-point post-task rating of difficulty. See the Single Ease Question (SEQ) guide.
- System Usability Scale (SUS) — a validated 0–100 score for the whole experience. See the System Usability Scale (SUS) guide.
In Koji, you capture these with structured questions — Koji supports six types (open_ended, scale, single_choice, multiple_choice, ranking, yes_no). You add SEQ as a scale question and the post-task "what was confusing?" as an open_ended question with AI probing. Because every scale answer is captured as a ground-truth structured value, Koji aggregates task-level difficulty automatically while still clustering the open-ended explanations into themed friction findings.
Common mistakes that ruin moderated sessions
- Leading the witness. "Was that easy?" invites a polite yes. Ask "How did that feel?" instead. See How to Avoid Leading Questions.
- Helping too soon. The moment you rescue a struggling user, you lose the finding.
- Testing the participant, not the product. If someone fails, the design failed — never imply otherwise, or social-desirability bias will distort everything that follows.
- Skipping the pilot. Always run one practice session to catch broken tasks before they cost you real participants.
- Synthesizing from memory. Notes taken during a live call are lossy; a verbatim transcript with coded themes is not.
How Koji makes moderated usability testing faster
Traditional moderation is bottlenecked by you — one researcher can only sit in so many calls. Koji removes that bottleneck without removing the depth:
- AI moderator runs the think-aloud session in voice or text, asking your tasks and probing hesitation with adaptive follow-ups — no scheduler, no calendar, available 24/7.
- Voice mode captures natural think-aloud narration; text mode renders interactive widgets for SEQ and choice questions.
- Automatic analysis transcribes every session, codes open-ended answers into themes, and aggregates task-level metrics into a real-time report you can share with one link.
- Scale without losing nuance — run 5 sessions or 50 in parallel; the per-question synthesis holds either way.
A study that used to take two weeks of scheduling, moderating, and synthesizing becomes an afternoon. You bring the tasks and the judgment; Koji handles the moderation and the math.
Related Resources
- Structured Questions in AI Interviews — the six question types that power task metrics and probes
- Unmoderated vs Moderated User Research: How to Choose
- How to Conduct Usability Testing: The Complete Guide
- Usability Testing Script Template
- Single Ease Question (SEQ): The 7-Point UX Metric
- System Usability Scale (SUS): Complete Guide
Related Articles
How to Avoid Leading Questions in Surveys and Interviews
Leading questions quietly bias your research data. Learn how to spot and rewrite leading, loaded, and double-barreled questions — and how Koji's AI writes neutral questions and probes without steering respondents.
Single Ease Question (SEQ): The 7-Point UX Metric for Task-Level Usability (2026)
The complete 2026 guide to the Single Ease Question (SEQ): the verbatim 7-point scale wording, Sauro–MeasuringU benchmarks (5.3–5.5 average), correlation with task completion, when to use SEQ vs SUS, and how to bundle SEQ into AI-moderated interviews on Koji to get task-level usability scores in days.
Structured Questions in AI Interviews
Mix quantitative data collection — scales, ratings, multiple choice, ranking — with AI-powered conversational follow-up in a single interview.
System Usability Scale (SUS): Complete Guide with Calculator, Benchmarks & Examples
The definitive 2026 guide to the System Usability Scale (SUS): the 10-question formula, scoring calculator, Sauro–Lewis benchmark grades, and how to deploy SUS at scale with AI-moderated interviews on Koji.
Unmoderated vs Moderated User Research: How to Choose
Understand the real differences between moderated and unmoderated user research — and how AI-moderated interviews give you depth at scale that traditional approaches never could.
How to Conduct Usability Testing: The Complete Guide
A comprehensive guide to usability testing for UX researchers and product managers. Covers types of testing, participant numbers, step-by-step facilitation, and the most common mistakes to avoid.
Usability Testing Script Template: A Free, Ready-to-Use Script for Moderated & Unmoderated Tests (2026)
A complete copy-paste usability testing script - intro, warm-up, tasks, post-task questions, and wrap-up - plus how to run it unmoderated at scale with an AI moderator.