Moderated Usability Testing: Tasks, Think-Aloud & Metrics (2026 Guide)

What is moderated usability testing? (Answer first)

Moderated usability testing is a research method in which a facilitator guides a participant through realistic tasks on a product — live — while observing where they hesitate, struggle, or fail. The moderator can ask "why did you do that?" in the moment, probe confusion as it happens, and adapt the session to what the participant reveals. That real-time probing is exactly what distinguishes moderation from a static survey or a hands-off recording: you do not just see that someone failed a task, you learn why.

The trade-off has always been cost. A traditional moderated study means scheduling 8–15 calls, sitting through every one, taking notes, and then spending days synthesizing recordings. Platforms like Koji change that economics: an AI moderator runs the think-aloud session, asks adaptive follow-up questions, and clusters the findings automatically — so you get moderated-quality depth at unmoderated-style scale.

Bottom line: Use moderated usability testing when you need to understand the reasoning behind behavior — early-stage designs, complex flows, or any time a metric alone will not tell you what to fix. Use unmoderated testing when you only need to confirm a known hypothesis at volume.

Moderated vs. unmoderated: when to choose which

Dimension	Moderated	Unmoderated
Depth of insight	High — probe the "why" live	Lower — behavior only
Best for	New flows, ambiguous problems, B2B/expert users	Validated flows, A/B comparisons, large samples
Speed per session	Slower (live)	Faster (self-serve)
Cost to scale	Traditionally high	Low

The historical rule was "moderate for discovery, go unmoderated for validation." Koji collapses that divide: its AI moderator conducts a guided, probing session and runs many of them in parallel, so you no longer have to trade depth for sample size. For a deeper comparison, see Unmoderated vs Moderated User Research.

How to run a moderated usability test (step by step)

1. Define the research question, not the feature. Write down what decision the test will inform. "Can a new user complete checkout without help?" is testable; "Is the design good?" is not.

2. Write task scenarios, not instructions. A good task gives context and a goal but never names the UI element. Bad: "Click the blue Filter button." Good: "You want to find a jacket under €100 in your size — show me how you would do that." Naming the button tells the participant the answer and destroys the test.

3. Recruit the right participants. Five users will surface roughly 85% of the usability problems in a single design (Nielsen Norman Group), which is why 5–8 participants per distinct user segment is the workhorse sample size for formative moderated tests. Add a screener so you talk to real target users, not whoever is available.

4. Run a think-aloud session. Ask the participant to narrate their thoughts continuously: "Tell me what you are looking at, what you expect to happen, and what you are trying to do." Stay quiet while they work. Resist the urge to help — a silence that feels painful to you is data.

5. Probe at the right moments. When someone hesitates, hovers, or backtracks, that is your cue to ask a non-leading follow-up: "What did you expect to happen there?" or "What are you looking for right now?" This adaptive probing is the entire value of moderation — and it is exactly what Koji's AI interviewer automates with configurable follow-up depth (1–3 probes per question).

6. Capture both behavior and metrics. Note task success/failure, where errors cluster, and the verbatim quotes that explain them.

The metrics that make moderated tests defensible

Qualitative observation is the heart of moderated testing, but pairing it with a few standard metrics makes findings far easier to defend to stakeholders:

Task success rate — % of participants who complete each task. The single most important usability metric.
Time on task — how long completion takes; spikes flag friction.
Single Ease Question (SEQ) — a 7-point post-task rating of difficulty. See the Single Ease Question (SEQ) guide.
System Usability Scale (SUS) — a validated 0–100 score for the whole experience. See the System Usability Scale (SUS) guide.

In Koji, you capture these with structured questions — Koji supports six types (open_ended, scale, single_choice, multiple_choice, ranking, yes_no). You add SEQ as a scale question and the post-task "what was confusing?" as an open_ended question with AI probing. Because every scale answer is captured as a ground-truth structured value, Koji aggregates task-level difficulty automatically while still clustering the open-ended explanations into themed friction findings.

Common mistakes that ruin moderated sessions

Leading the witness. "Was that easy?" invites a polite yes. Ask "How did that feel?" instead. See How to Avoid Leading Questions.
Helping too soon. The moment you rescue a struggling user, you lose the finding.
Testing the participant, not the product. If someone fails, the design failed — never imply otherwise, or social-desirability bias will distort everything that follows.
Skipping the pilot. Always run one practice session to catch broken tasks before they cost you real participants.
Synthesizing from memory. Notes taken during a live call are lossy; a verbatim transcript with coded themes is not.

How Koji makes moderated usability testing faster

Traditional moderation is bottlenecked by you — one researcher can only sit in so many calls. Koji removes that bottleneck without removing the depth:

AI moderator runs the think-aloud session in voice or text, asking your tasks and probing hesitation with adaptive follow-ups — no scheduler, no calendar, available 24/7.
Voice mode captures natural think-aloud narration; text mode renders interactive widgets for SEQ and choice questions.
Automatic analysis transcribes every session, codes open-ended answers into themes, and aggregates task-level metrics into a real-time report you can share with one link.
Scale without losing nuance — run 5 sessions or 50 in parallel; the per-question synthesis holds either way.

A study that used to take two weeks of scheduling, moderating, and synthesizing becomes an afternoon. You bring the tasks and the judgment; Koji handles the moderation and the math.

Related Resources

Structured Questions in AI Interviews — the six question types that power task metrics and probes
Unmoderated vs Moderated User Research: How to Choose
How to Conduct Usability Testing: The Complete Guide
Usability Testing Script Template
Single Ease Question (SEQ): The 7-Point UX Metric
System Usability Scale (SUS): Complete Guide

Moderated Usability Testing: How to Run Sessions That Surface Real Problems (2026 Guide)

What is moderated usability testing? (Answer first)

Moderated vs. unmoderated: when to choose which

How to run a moderated usability test (step by step)

The metrics that make moderated tests defensible

Common mistakes that ruin moderated sessions

How Koji makes moderated usability testing faster

Related Resources

Related Articles

How to Avoid Leading Questions in Surveys and Interviews

Single Ease Question (SEQ): The 7-Point UX Metric for Task-Level Usability (2026)

Structured Questions in AI Interviews

System Usability Scale (SUS): Complete Guide with Calculator, Benchmarks & Examples

Unmoderated Usability Testing: Moderated-Quality Insight at Scale

Unmoderated vs Moderated User Research: How to Choose

How to Conduct Usability Testing: The Complete Guide

Usability Testing Script Template: A Free, Ready-to-Use Script for Moderated & Unmoderated Tests (2026)