Single Ease Question (SEQ): The 7-Point UX Metric for Task-Level Usability (2026)
The complete 2026 guide to the Single Ease Question (SEQ): the verbatim 7-point scale wording, Sauro–MeasuringU benchmarks (5.3–5.5 average), correlation with task completion, when to use SEQ vs SUS, and how to bundle SEQ into AI-moderated interviews on Koji to get task-level usability scores in days.
What Is the Single Ease Question (SEQ)?
The Single Ease Question (SEQ) is a one-item, 7-point rating scale used immediately after a user attempts a task to measure how difficult or easy that task felt. It is the simplest, fastest, and most-validated post-task usability metric in modern UX research, and is the standard companion to behavioural measures like task completion rate and time on task.
The SEQ was popularised by Jeff Sauro and the team at MeasuringU after years of empirical comparison against other post-task questionnaires (After-Scenario Questionnaire, NASA-TLX, Subjective Mental Effort Question). Sauro’s research established that a single well-anchored question correlated just as strongly with task completion and time-on-task as longer multi-item scales — and was dramatically less work to administer. The result is a metric that has effectively become the default post-task measure across modern usability research.
The Verbatim SEQ Wording
Overall, how difficult or easy was [the task] to complete?
1 — Very Difficult 2 3 4 5 6 7 — Very Easy
A few critical implementation details:
- The scale runs from 1 (Very Difficult) to 7 (Very Easy). Reversing the polarity invalidates direct comparison to MeasuringU benchmarks.
- Only the endpoints are labelled. Some teams label the midpoint or every point; both reduce sensitivity.
- It is administered immediately after the task, not at the end of the session. The experience must be fresh.
- The bracketed task name should be specific. Use the actual task wording the user just attempted (“purchasing a coffee subscription”), not a generic “the previous task.”
Why SEQ Works
SEQ’s superpower is predictive validity — the score correlates strongly with what users actually did. Sauro’s benchmark research at MeasuringU established that:
- A raw SEQ score of 5.9 corresponds to a task completion rate of roughly 86% and an average task time of about 2 minutes.
- A raw SEQ score of 4.7 corresponds to a completion rate of roughly 58% and an average task time of about 2.8 minutes.
- The relationship is roughly linear within the 4.0–6.5 range that covers most real-world tasks.
This is unusually strong for a self-reported metric. Most attitudinal measures correlate weakly with behaviour. SEQ correlates almost as well with task success as task success itself — which is why it survives across two decades of usability research.
SEQ Benchmarks
According to MeasuringU’s published benchmark dataset of more than 400 tasks and 10,000+ users:
| SEQ Score | Interpretation |
|---|---|
| 6.5+ | Top-decile task. Almost all users succeed without friction. |
| 5.6–6.4 | Above average. Workable; minor friction. |
| 5.3–5.5 | Population average. Typical for a competent but unremarkable task. |
| 4.5–5.2 | Below average. Friction is real and worth investigating. |
| <4.5 | Bottom-decile. Likely a usability emergency. |
A crucial calibration: the 5.3–5.5 average sits above the nominal scale midpoint of 4. This is normal for 7-point scales — humans cluster toward the positive end of unlabeled scales. Treating 4 as “average” is the single most common SEQ misinterpretation.
Industry benchmark. “Across over 400 tasks and 10,000 users the average score hovers between about 5.3 and 5.6, which is above the nominal midpoint of 4 but is typical for 7-point scales.” — MeasuringU, 10 Things to Know About the Single Ease Question
SEQ vs SUS: When to Use Each
SEQ and SUS are not competing — they measure different things at different cadences.
| Dimension | SEQ | SUS |
|---|---|---|
| Scope | One specific task | Entire product/system |
| Timing | Immediately after each task | At the end of the test session |
| Question count | 1 | 10 |
| Scale | 1–7 | 1–5 (Likert) |
| Output | Per-task ease score | 0–100 system score |
| Best for | Diagnosing which task is hard | Benchmarking the whole product |
| Sample-size floor | ~10 per task | ~8 per study |
| Time to administer | <10 seconds | 60–90 seconds |
The canonical pattern in a moderated usability study is: SEQ after every task → SUS at the end. SEQ tells you which task is hard; SUS tells you whether the product is competitive against the 68 industry average. See the SUS guide for the full Sauro–Lewis benchmark scale.
How to Run a SEQ Study — Step by Step
Step 1: Define your tasks
Write each task as a goal the user can attempt without coaching. “Find a coat under £100 and add it to your basket” is a task. “Browse the catalogue” is not.
Step 2: Pick a sample size
Minimum 10–12 participants per task for reliability. For directional sprint testing, 8 is workable. For benchmarking or external reporting, aim for 30+. SEQ is unusually robust at small samples but never reliable below n=8.
Step 3: Run the task
Let the user attempt the task end-to-end. Do not interrupt. If they ask for help, treat it as a failure and move on.
Step 4: Administer the SEQ immediately
The instant the task ends — succeeded or failed — show the SEQ. Do not allow time for rationalisation. The fresher the response, the more diagnostic the score.
Step 5: Always pair SEQ with an open-ended probe
This is the single most under-used best practice. A bare SEQ score tells you the task is hard; the open-ended “What made the task feel that way?” tells you why. Without the probe, SEQ is a thermometer with no diagnosis.
Step 6: Analyse per task and across tasks
Per task: report the mean SEQ, the 95% confidence interval, and the % of users below 5. Across tasks: rank tasks by mean SEQ to identify the friction hotspots. Pair SEQ scores with task completion rates to triangulate.
Common SEQ Mistakes to Avoid
- Reversing the scale. Some teams label 1 as “easy” and 7 as “difficult.” This breaks every benchmark comparison. Stick to 1 = Very Difficult, 7 = Very Easy.
- Treating 4 as the average. The midpoint is statistically not the population average. The real average is 5.3–5.5. A score of 4 is well below average.
- Administering SEQ at the end of the session. Recall bias collapses the diagnostic value. Administer immediately after each task.
- Reporting SEQ without an open-ended probe. A score without a why is a metric you cannot act on.
- Using SEQ to benchmark the whole product. SEQ is a task metric. For a product-level benchmark, use SUS.
- Stopping at n=5. SEQ requires more participants than think-aloud sessions because it is quantitative. n=8 is a floor, n=10–12 is reliable, n=20+ is publishable.
The Modern Approach: SEQ at Scale With AI-Moderated Research
SEQ has always been easy to administer but expensive to run at scale. The traditional bottleneck is everything around the SEQ: recruiting, scheduling, moderating, transcribing the probes, then thematically analysing the open-ended responses. A 5-task SEQ study with 15 participants is two weeks of work for a research team — and most of those weeks are not the SEQ itself.
AI-native research platforms like Koji collapse this end-to-end. The modern SEQ workflow looks like this:
- Build the study in minutes. Use Koji’s structured questions — specifically the scale type (1–7) — to add the SEQ after each task. Add an open-ended probe directly underneath. Use the yes_no question type for binary task success.
- Launch via personalised link or in-product widget. No scheduling, no moderator availability constraints. The AI moderator runs the task with users 24/7.
- Get clean per-task data. Koji’s ground-truth widget scores every scale answer at high confidence. Per-task SEQ averages, 95% confidence intervals, and distributions update on the report in real time.
- Get the why automatically. Koji’s thematic analysis engine clusters the open-ended probe responses into friction themes per task — eliminating the manual coding step that traditionally consumes the entire week after a study closes.
- Compare across releases. Re-run the same SEQ study after every release to track per-task ease over time, exactly as you would track SUS or NPS at the system level.
Forrester’s State of Customer Insights 2024 found teams using AI-moderated research achieve 60% faster time-to-insight than teams running equivalent studies manually. For SEQ studies specifically — where the bottleneck is rarely the metric itself but the moderation and analysis around it — the gap is closer to 80%. Koji customers routinely run 5-task SEQ studies in an afternoon that previously took a fortnight.
The broader point is that SEQ’s adoption has historically been limited not by the metric’s value (which is well-established) but by the operational cost of running enough sessions to make the score meaningful. Removing that operational cost is the actual research breakthrough — the metric itself has been settled science for two decades.
When NOT to Use SEQ
SEQ is not the right tool for:
- System-level benchmarking — use SUS instead
- Loyalty or recommendation intent — use NPS
- Effort to resolve a problem — use Customer Effort Score (CES)
- Generative discovery (“what should we build?”) — use Mom Test interviews or JTBD interviews
SEQ shines for one job and one job only: measuring the perceived ease of a specific task immediately after it is attempted. Used inside its lane, it is the highest-leverage metric in the usability researcher’s toolkit.
Related Resources
- Structured Questions in AI Interviews — the six Koji question types, including the scale type used to deploy SEQ
- System Usability Scale (SUS): Complete Guide — the system-level companion to SEQ
- Customer Effort Score (CES): How to Measure and Reduce Friction — a related effort-based metric for support and resolution flows
- HEART Framework: Google’s 5-Metric UX Model — where SEQ slots in as the Task Success attitudinal signal
- Likert Scale Questions in User Research — broader scale-design principles relevant to SEQ
- Usability Testing: The Complete Guide — the parent methodology in which SEQ is administered
Related Articles
Scale Questions in AI Interviews: Measure NPS, CSAT, and Ratings Automatically
Learn how to configure and use scale questions in Koji AI interviews to capture NPS, CSAT, and satisfaction ratings — with automatic probing and aggregated distribution charts in your research report.
Structured Questions in AI Interviews
Mix quantitative data collection — scales, ratings, multiple choice, ranking — with AI-powered conversational follow-up in a single interview.
HEART Framework: Google’s 5-Metric Model for Measuring User Experience (2026 Guide)
The complete guide to Google’s HEART framework — the five user-centered metrics (Happiness, Engagement, Adoption, Retention, Task Success), the Goals–Signals–Metrics process, and how to collect each metric in days, not quarters, with AI-moderated research on Koji.
Likert Scale Questions: How to Use Rating Scales in User Research
A complete guide to Likert scale questions in user research — what they are, when to use them, how to write them correctly, and how Koji's AI interviews take rating scales further by pairing quantitative scores with qualitative follow-up.
System Usability Scale (SUS): Complete Guide with Calculator, Benchmarks & Examples
The definitive 2026 guide to the System Usability Scale (SUS): the 10-question formula, scoring calculator, Sauro–Lewis benchmark grades, and how to deploy SUS at scale with AI-moderated interviews on Koji.
How to Conduct Usability Testing: The Complete Guide
A comprehensive guide to usability testing for UX researchers and product managers. Covers types of testing, participant numbers, step-by-step facilitation, and the most common mistakes to avoid.
How to Measure Customer Effort Score (CES) and Reduce Friction
The complete guide to Customer Effort Score surveys. Learn how to measure and reduce friction in customer interactions, and why low-effort experiences drive loyalty more than delight.