{"site":{"name":"Koji","description":"AI-native customer research platform that helps teams conduct, analyze, and synthesize customer interviews at scale.","url":"https://www.koji.so","contentTypes":["blog","documentation"],"lastUpdated":"2026-06-17T04:38:59.058Z"},"content":[{"type":"documentation","id":"268df3eb-2fd5-429e-8d19-c3be250149bb","slug":"usability-metrics-guide","title":"Usability Metrics: Task Success Rate, Time on Task, and Error Rate Explained","url":"https://www.koji.so/docs/usability-metrics-guide","summary":"A definitive reference on the three core usability metrics — task success rate (effectiveness), time on task (efficiency), and error rate (accuracy) — with formulas, the 78% MeasuringU completion benchmark, sample-size guidance, and how Koji captures all three automatically via structured questions and an AI moderator.","content":"## What are the core usability metrics?\n\n**The three core usability metrics are task success rate (effectiveness), time on task (efficiency), and error rate (accuracy).** Together they answer the only questions that matter in a usability test: Can users finish the job? How long does it take them? And how many mistakes do they make along the way? Every other quantitative usability measure — completion confidence, lostness, task-level satisfaction — is a refinement of these three.\n\nThe Nielsen Norman Group calls success rate \"the simplest usability metric\" precisely because it is the bottom line of usability: if users cannot complete what they came to do, nothing else about the interface matters. The strength of these three metrics is that they are objective and behavioral. Unlike attitudinal scores such as NPS or [SUS](/docs/system-usability-scale-guide), they record what users *actually did*, not what they later said they felt.\n\nThis guide defines each metric, gives you the formulas and the published industry benchmarks, explains how many participants you need, and shows how an AI-native platform like Koji captures all three automatically — turning a multi-day analysis grind into a real-time dashboard.\n\n## Metric 1: Task success rate (effectiveness)\n\nTask success rate is the percentage of participants who complete a task successfully out of everyone who attempted it.\n\n> **Task success rate = (number of successful attempts ÷ total attempts) × 100**\n\nIf 17 of 20 participants successfully add an item to their cart and reach checkout, your success rate is 85%.\n\n**The benchmark:** In an analysis of 1,189 tasks across 115 usability studies, MeasuringU founder Jeff Sauro found the average task completion rate is **78%**. Most teams treat roughly 78–80% as the dividing line between \"acceptable\" and \"needs work\" for an important task — though success rate is highly sensitive to task difficulty, so the right target is always relative to the task and to your own historical baseline.\n\n**Binary vs. levels of success.** The cleanest version is binary: a participant either completed the task or did not. But many teams record *levels of success* — full success, partial success (completed with significant struggle or workaround), and failure — because a binary view hides the difference between a user who breezed through and one who barely limped to the finish line. Partial successes are often where your richest design insights hide.\n\n**The trap:** success rate alone is misleading. A task can show a 90% success rate while users take three minutes and make two errors getting there. That is why success rate must always be read alongside time and errors.\n\n## Metric 2: Time on task (efficiency)\n\nTime on task measures how long it takes a participant to complete a task, usually reported as the mean or median time in seconds for successful attempts only. (Including failed attempts pollutes the number — a user who gave up after 10 seconds would otherwise look \"efficient.\")\n\nBecause time data is almost always skewed by a few very slow users, the **geometric mean or the median** is the statistically appropriate measure of center for small samples, not the arithmetic mean. Report a measure of spread too — the range or confidence interval — because an average of 45 seconds means something very different if the spread is 40–50 seconds versus 10–120 seconds.\n\n**How to use it:** time on task is most powerful as a *comparative* metric — old design vs. new, your product vs. a competitor, or release over release. An absolute \"good\" time rarely exists in isolation; a 30-second task time is excellent for a complex configuration flow and terrible for a one-click action.\n\n## Metric 3: Error rate (accuracy)\n\nAn error is any unintended action, slip, mistake, or omission a user makes while attempting a task. Error rate is typically expressed as errors per task (the average number of errors across all attempts) or as a defect rate (the percentage of attempts containing at least one error).\n\n> **Errors per task = total errors observed ÷ total attempts**\n\n**The benchmark:** across an analysis of 719 tasks using consumer and business software, Jeff Sauro found an average of **0.7 errors per task**, with roughly **two out of every three users making at least one error**. Errors are far more common than most teams assume — which is exactly why counting them surfaces friction that success rate alone would never reveal.\n\nNot all errors are equal. Classify them by severity (does the error block completion, or merely slow the user down?) and by type (slips, where the user knows the goal but executes the wrong action, vs. mistakes, where the user has the wrong mental model). The pattern in *where* errors cluster is usually more actionable than the raw count.\n\n## Putting the three together\n\nEffectiveness, efficiency, and accuracy form a triangle. A mature usability scorecard reads all three at once:\n\n| Metric | What it measures | Typical benchmark |\n| --- | --- | --- |\n| Task success rate | Effectiveness — can they finish? | ~78% average (Sauro/MeasuringU) |\n| Time on task | Efficiency — how fast? | Comparative; no universal target |\n| Error rate | Accuracy — how clean? | ~0.7 errors/task; ~2 of 3 users err |\n\nLayer a task-level satisfaction question on top — a single [scale question](/docs/structured-questions-guide) such as \"How easy or difficult was that task?\" — and you capture the user's attitude alongside their behavior. The Nielsen Norman Group repeatedly finds that performance and satisfaction metrics correlate only moderately, so measuring both protects you from shipping something users *can* use but *hate* using.\n\n## How many participants do you need?\n\nFor **qualitative, formative** usability testing — finding problems to fix — five users per round uncovers roughly 85% of issues, the classic Nielsen-Landauer finding. But the moment you want *reliable quantitative metrics* like a stable success rate or time on task, five is far too few: the confidence interval on a metric from five users is enormous.\n\nA practical rule of thumb:\n\n- **5–8 users** — formative testing, finding usability problems (not for reporting precise numbers).\n- **15–20 users** — a reasonably tight success-rate estimate for a single design.\n- **30–50+ users** — benchmark-grade metrics you intend to track over time or quote externally. (See our [usability benchmarking guide](/docs/usability-benchmarking-guide) for the full methodology.)\n\nUse adjusted-Wald binomial confidence intervals for small-sample completion rates rather than naive percentages — a 4-of-5 success \"80%\" actually carries a confidence interval running from roughly 36% to 98%.\n\n## The modern approach: capturing usability metrics with AI\n\nTraditionally, collecting these three metrics meant scheduling moderated sessions, watching every recording, manually timing each task with a stopwatch, tallying errors by hand, and reconciling notes across a research team. A 20-participant benchmark could swallow a week of analyst time — which is why most teams measured usability once a quarter at best, if at all.\n\nThis is exactly the bottleneck AI-native research platforms remove. **Koji** captures all three core metrics automatically:\n\n- **Task success rate** is recorded directly through structured questions. Frame each task with a `yes_no` or `single_choice` outcome question, and Koji aggregates the completion rate across every respondent in real time — no manual tallying.\n- **Time on task** is timestamped automatically for every session, with the distribution (median, range, outliers) computed and charted as responses arrive.\n- **Error and friction signals** surface through Koji's AI moderator, which probes in the moment (\"What made that step confusing?\") and then clusters the open-ended answers into themed friction findings, so you see *where* and *why* users struggle, not just *that* they did.\n\nKoji supports all six [structured question types](/docs/structured-questions-guide) — `open_ended`, `scale`, `single_choice`, `multiple_choice`, `ranking`, and `yes_no` — which means you can capture a binary success flag, a 1–5 ease rating, and a rich open-ended \"what went wrong\" in a single automated study. Because Koji runs 24/7, you can recruit 30–50 participants for a true quantitative benchmark in days rather than weeks, and re-run the identical study every release to track the trend line. Teams using AI-assisted research tools consistently report dramatically faster time-to-insight precisely because the counting, timing, and tagging — the slow part — is done the instant the last response lands.\n\nYou do not need a PhD in measurement theory to run a rigorous usability study. Define the tasks, attach the right structured questions, and let the platform handle the statistics.\n\n## Common mistakes to avoid\n\n1. **Reporting time on task for failed attempts.** Always separate successful and unsuccessful times.\n2. **Using the arithmetic mean on small samples.** Time data is skewed — use the median or geometric mean.\n3. **Quoting a success rate without a confidence interval.** \"80% from five users\" is not a precise number.\n4. **Measuring success but never satisfaction.** A usable-but-frustrating product still loses users.\n5. **Changing the task wording between benchmark rounds.** Consistency is what makes release-over-release comparison valid.\n\n## Related Resources\n\n- [Structured Questions Guide](/docs/structured-questions-guide) — the six question types for capturing success, ease, and friction\n- [Usability Testing: The Complete Guide](/docs/usability-testing-guide) — the end-to-end method these metrics live inside\n- [Usability Benchmarking Guide](/docs/usability-benchmarking-guide) — turning these metrics into a tracked program\n- [System Usability Scale (SUS) Guide](/docs/system-usability-scale-guide) — the standard attitudinal usability score\n- [Customer Effort Score Guide](/docs/customer-effort-score-guide) — measuring perceived ease at the task level\n- [Think-Aloud Protocol](/docs/think-aloud-protocol) — surfacing the *why* behind every error","category":"Research Methods","lastModified":"2026-06-16T03:16:30.471681+00:00","metaTitle":"Usability Metrics: Task Success Rate, Time on Task & Error Rate (2026)","metaDescription":"Master the three core usability metrics — task success rate, time on task, and error rate. Formulas, industry benchmarks (78% avg completion), sample sizes, and how to capture them automatically with AI.","keywords":["usability metrics","task success rate","time on task","error rate","usability testing metrics","task completion rate","usability benchmark","UX metrics","effectiveness efficiency","quantitative usability"],"aiSummary":"A definitive reference on the three core usability metrics — task success rate (effectiveness), time on task (efficiency), and error rate (accuracy) — with formulas, the 78% MeasuringU completion benchmark, sample-size guidance, and how Koji captures all three automatically via structured questions and an AI moderator.","aiPrerequisites":["Familiarity with usability testing concepts","A product or prototype to evaluate","Basic understanding of percentages and averages"],"aiLearningOutcomes":["Define and calculate task success rate, time on task, and error rate","Interpret each metric against published industry benchmarks","Choose the right sample size for qualitative vs quantitative usability studies","Avoid the five most common usability-metric mistakes","Capture all three metrics automatically in Koji using structured questions"],"aiDifficulty":"beginner","aiEstimatedTime":"13 min read"}],"pagination":{"total":1,"returned":1,"offset":0}}