{"site":{"name":"Koji","description":"AI-native customer research platform that helps teams conduct, analyze, and synthesize customer interviews at scale.","url":"https://www.koji.so","contentTypes":["blog","documentation"],"lastUpdated":"2026-05-09T07:14:21.803Z"},"content":[{"type":"documentation","id":"76990745-46d5-4717-b57e-6494f3c3a3f1","slug":"single-ease-question-seq-guide","title":"Single Ease Question (SEQ): The 7-Point UX Metric for Task-Level Usability (2026)","url":"https://www.koji.so/docs/single-ease-question-seq-guide","summary":"A definitive guide to the Single Ease Question (SEQ): the verbatim 7-point wording, MeasuringU/Sauro benchmark of 5.3–5.5, correlation with task completion (5.9 ≈ 86% completion), when to choose SEQ over SUS, sample-size guidance, and how Koji turns a 2-week SEQ study into an afternoon.","content":"## What Is the Single Ease Question (SEQ)?\n\nThe **Single Ease Question (SEQ)** is a one-item, 7-point rating scale used immediately after a user attempts a task to measure how difficult or easy that task felt. It is the simplest, fastest, and most-validated post-task usability metric in modern UX research, and is the standard companion to behavioural measures like task completion rate and time on task.\n\nThe SEQ was popularised by **Jeff Sauro** and the team at MeasuringU after years of empirical comparison against other post-task questionnaires (After-Scenario Questionnaire, NASA-TLX, Subjective Mental Effort Question). Sauro’s research established that a single well-anchored question correlated *just as strongly* with task completion and time-on-task as longer multi-item scales — and was dramatically less work to administer. The result is a metric that has effectively become the default post-task measure across modern usability research.\n\n## The Verbatim SEQ Wording\n\n> **Overall, how difficult or easy was [the task] to complete?**\n>\n> 1 — Very Difficult\n> 2\n> 3\n> 4\n> 5\n> 6\n> 7 — Very Easy\n\nA few critical implementation details:\n\n- **The scale runs from 1 (Very Difficult) to 7 (Very Easy).** Reversing the polarity invalidates direct comparison to MeasuringU benchmarks.\n- **Only the endpoints are labelled.** Some teams label the midpoint or every point; both reduce sensitivity.\n- **It is administered *immediately after* the task**, not at the end of the session. The experience must be fresh.\n- **The bracketed task name should be specific.** Use the actual task wording the user just attempted (“purchasing a coffee subscription”), not a generic “the previous task.”\n\n## Why SEQ Works\n\nSEQ’s superpower is **predictive validity** — the score correlates strongly with what users actually did. Sauro’s benchmark research at MeasuringU established that:\n\n- A raw SEQ score of **5.9** corresponds to a task completion rate of roughly **86%** and an average task time of about 2 minutes.\n- A raw SEQ score of **4.7** corresponds to a completion rate of roughly **58%** and an average task time of about 2.8 minutes.\n- The relationship is roughly linear within the 4.0–6.5 range that covers most real-world tasks.\n\nThis is unusually strong for a self-reported metric. Most attitudinal measures correlate weakly with behaviour. SEQ correlates almost as well with task success as task success itself — which is why it survives across two decades of usability research.\n\n## SEQ Benchmarks\n\nAccording to MeasuringU’s published benchmark dataset of more than 400 tasks and 10,000+ users:\n\n| SEQ Score | Interpretation |\n|---|---|\n| 6.5+ | Top-decile task. Almost all users succeed without friction. |\n| 5.6–6.4 | Above average. Workable; minor friction. |\n| **5.3–5.5** | **Population average.** Typical for a competent but unremarkable task. |\n| 4.5–5.2 | Below average. Friction is real and worth investigating. |\n| <4.5 | Bottom-decile. Likely a usability emergency. |\n\nA crucial calibration: the 5.3–5.5 average sits *above* the nominal scale midpoint of 4. This is normal for 7-point scales — humans cluster toward the positive end of unlabeled scales. Treating 4 as “average” is the single most common SEQ misinterpretation.\n\n> **Industry benchmark.** “Across over 400 tasks and 10,000 users the average score hovers between about 5.3 and 5.6, which is above the nominal midpoint of 4 but is typical for 7-point scales.” — MeasuringU, *10 Things to Know About the Single Ease Question*\n\n## SEQ vs SUS: When to Use Each\n\nSEQ and SUS are not competing — they measure different things at different cadences.\n\n| Dimension | SEQ | SUS |\n|---|---|---|\n| Scope | One specific task | Entire product/system |\n| Timing | Immediately after each task | At the end of the test session |\n| Question count | 1 | 10 |\n| Scale | 1–7 | 1–5 (Likert) |\n| Output | Per-task ease score | 0–100 system score |\n| Best for | Diagnosing which task is hard | Benchmarking the whole product |\n| Sample-size floor | ~10 per task | ~8 per study |\n| Time to administer | <10 seconds | 60–90 seconds |\n\nThe canonical pattern in a moderated usability study is: SEQ after every task → SUS at the end. SEQ tells you *which* task is hard; SUS tells you whether the *product* is competitive against the 68 industry average. See the [SUS guide](/docs/system-usability-scale-guide) for the full Sauro–Lewis benchmark scale.\n\n## How to Run a SEQ Study — Step by Step\n\n### Step 1: Define your tasks\nWrite each task as a goal the user can attempt without coaching. “Find a coat under £100 and add it to your basket” is a task. “Browse the catalogue” is not.\n\n### Step 2: Pick a sample size\nMinimum 10–12 participants per task for reliability. For directional sprint testing, 8 is workable. For benchmarking or external reporting, aim for 30+. SEQ is unusually robust at small samples but never reliable below n=8.\n\n### Step 3: Run the task\nLet the user attempt the task end-to-end. Do not interrupt. If they ask for help, treat it as a failure and move on.\n\n### Step 4: Administer the SEQ immediately\nThe instant the task ends — succeeded or failed — show the SEQ. Do not allow time for rationalisation. The fresher the response, the more diagnostic the score.\n\n### Step 5: Always pair SEQ with an open-ended probe\nThis is the single most under-used best practice. A bare SEQ score tells you the task is hard; the open-ended “What made the task feel that way?” tells you *why*. Without the probe, SEQ is a thermometer with no diagnosis.\n\n### Step 6: Analyse per task and across tasks\nPer task: report the mean SEQ, the 95% confidence interval, and the % of users below 5. Across tasks: rank tasks by mean SEQ to identify the friction hotspots. Pair SEQ scores with task completion rates to triangulate.\n\n## Common SEQ Mistakes to Avoid\n\n1. **Reversing the scale.** Some teams label 1 as “easy” and 7 as “difficult.” This breaks every benchmark comparison. Stick to 1 = Very Difficult, 7 = Very Easy.\n2. **Treating 4 as the average.** The midpoint is statistically *not* the population average. The real average is 5.3–5.5. A score of 4 is well below average.\n3. **Administering SEQ at the end of the session.** Recall bias collapses the diagnostic value. Administer immediately after each task.\n4. **Reporting SEQ without an open-ended probe.** A score without a *why* is a metric you cannot act on.\n5. **Using SEQ to benchmark the whole product.** SEQ is a task metric. For a product-level benchmark, use [SUS](/docs/system-usability-scale-guide).\n6. **Stopping at n=5.** SEQ requires more participants than think-aloud sessions because it is quantitative. n=8 is a floor, n=10–12 is reliable, n=20+ is publishable.\n\n## The Modern Approach: SEQ at Scale With AI-Moderated Research\n\nSEQ has always been *easy to administer* but *expensive to run at scale*. The traditional bottleneck is everything around the SEQ: recruiting, scheduling, moderating, transcribing the probes, then thematically analysing the open-ended responses. A 5-task SEQ study with 15 participants is two weeks of work for a research team — and most of those weeks are not the SEQ itself.\n\nAI-native research platforms like **Koji** collapse this end-to-end. The modern SEQ workflow looks like this:\n\n1. **Build the study in minutes.** Use Koji’s [structured questions](/docs/structured-questions-guide) — specifically the *scale* type (1–7) — to add the SEQ after each task. Add an open-ended *probe* directly underneath. Use the *yes_no* question type for binary task success.\n2. **Launch via personalised link or in-product widget.** No scheduling, no moderator availability constraints. The AI moderator runs the task with users 24/7.\n3. **Get clean per-task data.** Koji’s ground-truth widget scores every scale answer at high confidence. Per-task SEQ averages, 95% confidence intervals, and distributions update on the report in real time.\n4. **Get the *why* automatically.** Koji’s thematic analysis engine clusters the open-ended probe responses into friction themes per task — eliminating the manual coding step that traditionally consumes the entire week after a study closes.\n5. **Compare across releases.** Re-run the same SEQ study after every release to track per-task ease over time, exactly as you would track SUS or NPS at the system level.\n\nForrester’s *State of Customer Insights 2024* found teams using AI-moderated research achieve **60% faster time-to-insight** than teams running equivalent studies manually. For SEQ studies specifically — where the bottleneck is rarely the metric itself but the moderation and analysis around it — the gap is closer to 80%. Koji customers routinely run 5-task SEQ studies in an afternoon that previously took a fortnight.\n\nThe broader point is that SEQ’s adoption has historically been limited not by the metric’s value (which is well-established) but by the operational cost of running enough sessions to make the score meaningful. Removing that operational cost is the actual research breakthrough — the metric itself has been settled science for two decades.\n\n## When NOT to Use SEQ\n\nSEQ is not the right tool for:\n\n- **System-level benchmarking** — use [SUS](/docs/system-usability-scale-guide) instead\n- **Loyalty or recommendation intent** — use [NPS](/docs/nps-survey-guide)\n- **Effort to *resolve a problem*** — use [Customer Effort Score (CES)](/docs/customer-effort-score-guide)\n- **Generative discovery** (“what should we build?”) — use [Mom Test interviews](/docs/mom-test-methodology) or [JTBD interviews](/docs/jobs-to-be-done-framework)\n\nSEQ shines for one job and one job only: **measuring the perceived ease of a specific task immediately after it is attempted.** Used inside its lane, it is the highest-leverage metric in the usability researcher’s toolkit.\n\n## Related Resources\n\n- [Structured Questions in AI Interviews](/docs/structured-questions-guide) — the six Koji question types, including the scale type used to deploy SEQ\n- [System Usability Scale (SUS): Complete Guide](/docs/system-usability-scale-guide) — the system-level companion to SEQ\n- [Customer Effort Score (CES): How to Measure and Reduce Friction](/docs/customer-effort-score-guide) — a related effort-based metric for support and resolution flows\n- [HEART Framework: Google’s 5-Metric UX Model](/docs/heart-framework-ux-metrics) — where SEQ slots in as the Task Success attitudinal signal\n- [Likert Scale Questions in User Research](/docs/likert-scale-research-guide) — broader scale-design principles relevant to SEQ\n- [Usability Testing: The Complete Guide](/docs/usability-testing-guide) — the parent methodology in which SEQ is administered","category":"Research Methods","lastModified":"2026-05-07T03:20:47.979491+00:00","metaTitle":"Single Ease Question (SEQ): 7-Point Scale, Benchmarks & 2026 Guide","metaDescription":"The definitive Single Ease Question (SEQ) guide: the verbatim 7-point scale, MeasuringU benchmarks, when to pair SEQ with SUS, sample-size rules, and how to deploy SEQ in minutes with AI-moderated research on Koji.","keywords":["single ease question","SEQ","SEQ usability metric","task-level usability","jeff sauro SEQ","MeasuringU SEQ","post-task survey","7-point scale usability","SEQ benchmark","SEQ vs SUS"],"aiSummary":"A definitive guide to the Single Ease Question (SEQ): the verbatim 7-point wording, MeasuringU/Sauro benchmark of 5.3–5.5, correlation with task completion (5.9 ≈ 86% completion), when to choose SEQ over SUS, sample-size guidance, and how Koji turns a 2-week SEQ study into an afternoon.","aiPrerequisites":["Familiarity with usability testing concepts","Basic understanding of Likert/scale rating","A defined user task to evaluate"],"aiLearningOutcomes":["Use the verbatim SEQ wording correctly in any usability test","Interpret a SEQ score against the MeasuringU 5.3–5.5 benchmark","Choose between SEQ and SUS for a given study","Pair SEQ with an open-ended probe for diagnostic insight","Run a multi-task SEQ study on Koji in a single afternoon"],"aiDifficulty":"beginner","aiEstimatedTime":"11 min read"}],"pagination":{"total":1,"returned":1,"offset":0}}