New

Now in Claude, ChatGPT, Cursor & more with our MCP server

Back to docs
Research Methods

Single Ease Question (SEQ): The 7-Point UX Metric for Task-Level Usability (2026)

The complete 2026 guide to the Single Ease Question (SEQ): the verbatim 7-point scale wording, Sauro–MeasuringU benchmarks (5.3–5.5 average), correlation with task completion, when to use SEQ vs SUS, and how to bundle SEQ into AI-moderated interviews on Koji to get task-level usability scores in days.

What Is the Single Ease Question (SEQ)?

The Single Ease Question (SEQ) is a one-item, 7-point rating scale used immediately after a user attempts a task to measure how difficult or easy that task felt. It is the simplest, fastest, and most-validated post-task usability metric in modern UX research, and is the standard companion to behavioural measures like task completion rate and time on task.

The SEQ was popularised by Jeff Sauro and the team at MeasuringU after years of empirical comparison against other post-task questionnaires (After-Scenario Questionnaire, NASA-TLX, Subjective Mental Effort Question). Sauro’s research established that a single well-anchored question correlated just as strongly with task completion and time-on-task as longer multi-item scales — and was dramatically less work to administer. The result is a metric that has effectively become the default post-task measure across modern usability research.

The Verbatim SEQ Wording

Overall, how difficult or easy was [the task] to complete?

1 — Very Difficult 2 3 4 5 6 7 — Very Easy

A few critical implementation details:

  • The scale runs from 1 (Very Difficult) to 7 (Very Easy). Reversing the polarity invalidates direct comparison to MeasuringU benchmarks.
  • Only the endpoints are labelled. Some teams label the midpoint or every point; both reduce sensitivity.
  • It is administered immediately after the task, not at the end of the session. The experience must be fresh.
  • The bracketed task name should be specific. Use the actual task wording the user just attempted (“purchasing a coffee subscription”), not a generic “the previous task.”

Why SEQ Works

SEQ’s superpower is predictive validity — the score correlates strongly with what users actually did. Sauro’s benchmark research at MeasuringU established that:

  • A raw SEQ score of 5.9 corresponds to a task completion rate of roughly 86% and an average task time of about 2 minutes.
  • A raw SEQ score of 4.7 corresponds to a completion rate of roughly 58% and an average task time of about 2.8 minutes.
  • The relationship is roughly linear within the 4.0–6.5 range that covers most real-world tasks.

This is unusually strong for a self-reported metric. Most attitudinal measures correlate weakly with behaviour. SEQ correlates almost as well with task success as task success itself — which is why it survives across two decades of usability research.

SEQ Benchmarks

According to MeasuringU’s published benchmark dataset of more than 400 tasks and 10,000+ users:

SEQ ScoreInterpretation
6.5+Top-decile task. Almost all users succeed without friction.
5.6–6.4Above average. Workable; minor friction.
5.3–5.5Population average. Typical for a competent but unremarkable task.
4.5–5.2Below average. Friction is real and worth investigating.
<4.5Bottom-decile. Likely a usability emergency.

A crucial calibration: the 5.3–5.5 average sits above the nominal scale midpoint of 4. This is normal for 7-point scales — humans cluster toward the positive end of unlabeled scales. Treating 4 as “average” is the single most common SEQ misinterpretation.

Industry benchmark. “Across over 400 tasks and 10,000 users the average score hovers between about 5.3 and 5.6, which is above the nominal midpoint of 4 but is typical for 7-point scales.” — MeasuringU, 10 Things to Know About the Single Ease Question

SEQ vs SUS: When to Use Each

SEQ and SUS are not competing — they measure different things at different cadences.

DimensionSEQSUS
ScopeOne specific taskEntire product/system
TimingImmediately after each taskAt the end of the test session
Question count110
Scale1–71–5 (Likert)
OutputPer-task ease score0–100 system score
Best forDiagnosing which task is hardBenchmarking the whole product
Sample-size floor~10 per task~8 per study
Time to administer<10 seconds60–90 seconds

The canonical pattern in a moderated usability study is: SEQ after every task → SUS at the end. SEQ tells you which task is hard; SUS tells you whether the product is competitive against the 68 industry average. See the SUS guide for the full Sauro–Lewis benchmark scale.

How to Run a SEQ Study — Step by Step

Step 1: Define your tasks

Write each task as a goal the user can attempt without coaching. “Find a coat under £100 and add it to your basket” is a task. “Browse the catalogue” is not.

Step 2: Pick a sample size

Minimum 10–12 participants per task for reliability. For directional sprint testing, 8 is workable. For benchmarking or external reporting, aim for 30+. SEQ is unusually robust at small samples but never reliable below n=8.

Step 3: Run the task

Let the user attempt the task end-to-end. Do not interrupt. If they ask for help, treat it as a failure and move on.

Step 4: Administer the SEQ immediately

The instant the task ends — succeeded or failed — show the SEQ. Do not allow time for rationalisation. The fresher the response, the more diagnostic the score.

Step 5: Always pair SEQ with an open-ended probe

This is the single most under-used best practice. A bare SEQ score tells you the task is hard; the open-ended “What made the task feel that way?” tells you why. Without the probe, SEQ is a thermometer with no diagnosis.

Step 6: Analyse per task and across tasks

Per task: report the mean SEQ, the 95% confidence interval, and the % of users below 5. Across tasks: rank tasks by mean SEQ to identify the friction hotspots. Pair SEQ scores with task completion rates to triangulate.

Common SEQ Mistakes to Avoid

  1. Reversing the scale. Some teams label 1 as “easy” and 7 as “difficult.” This breaks every benchmark comparison. Stick to 1 = Very Difficult, 7 = Very Easy.
  2. Treating 4 as the average. The midpoint is statistically not the population average. The real average is 5.3–5.5. A score of 4 is well below average.
  3. Administering SEQ at the end of the session. Recall bias collapses the diagnostic value. Administer immediately after each task.
  4. Reporting SEQ without an open-ended probe. A score without a why is a metric you cannot act on.
  5. Using SEQ to benchmark the whole product. SEQ is a task metric. For a product-level benchmark, use SUS.
  6. Stopping at n=5. SEQ requires more participants than think-aloud sessions because it is quantitative. n=8 is a floor, n=10–12 is reliable, n=20+ is publishable.

The Modern Approach: SEQ at Scale With AI-Moderated Research

SEQ has always been easy to administer but expensive to run at scale. The traditional bottleneck is everything around the SEQ: recruiting, scheduling, moderating, transcribing the probes, then thematically analysing the open-ended responses. A 5-task SEQ study with 15 participants is two weeks of work for a research team — and most of those weeks are not the SEQ itself.

AI-native research platforms like Koji collapse this end-to-end. The modern SEQ workflow looks like this:

  1. Build the study in minutes. Use Koji’s structured questions — specifically the scale type (1–7) — to add the SEQ after each task. Add an open-ended probe directly underneath. Use the yes_no question type for binary task success.
  2. Launch via personalised link or in-product widget. No scheduling, no moderator availability constraints. The AI moderator runs the task with users 24/7.
  3. Get clean per-task data. Koji’s ground-truth widget scores every scale answer at high confidence. Per-task SEQ averages, 95% confidence intervals, and distributions update on the report in real time.
  4. Get the why automatically. Koji’s thematic analysis engine clusters the open-ended probe responses into friction themes per task — eliminating the manual coding step that traditionally consumes the entire week after a study closes.
  5. Compare across releases. Re-run the same SEQ study after every release to track per-task ease over time, exactly as you would track SUS or NPS at the system level.

Forrester’s State of Customer Insights 2024 found teams using AI-moderated research achieve 60% faster time-to-insight than teams running equivalent studies manually. For SEQ studies specifically — where the bottleneck is rarely the metric itself but the moderation and analysis around it — the gap is closer to 80%. Koji customers routinely run 5-task SEQ studies in an afternoon that previously took a fortnight.

The broader point is that SEQ’s adoption has historically been limited not by the metric’s value (which is well-established) but by the operational cost of running enough sessions to make the score meaningful. Removing that operational cost is the actual research breakthrough — the metric itself has been settled science for two decades.

When NOT to Use SEQ

SEQ is not the right tool for:

SEQ shines for one job and one job only: measuring the perceived ease of a specific task immediately after it is attempted. Used inside its lane, it is the highest-leverage metric in the usability researcher’s toolkit.

Related Resources

Related Articles

Scale Questions in AI Interviews: Measure NPS, CSAT, and Ratings Automatically

Learn how to configure and use scale questions in Koji AI interviews to capture NPS, CSAT, and satisfaction ratings — with automatic probing and aggregated distribution charts in your research report.

Structured Questions in AI Interviews

Mix quantitative data collection — scales, ratings, multiple choice, ranking — with AI-powered conversational follow-up in a single interview.

HEART Framework: Google’s 5-Metric Model for Measuring User Experience (2026 Guide)

The complete guide to Google’s HEART framework — the five user-centered metrics (Happiness, Engagement, Adoption, Retention, Task Success), the Goals–Signals–Metrics process, and how to collect each metric in days, not quarters, with AI-moderated research on Koji.

Likert Scale Questions: How to Use Rating Scales in User Research

A complete guide to Likert scale questions in user research — what they are, when to use them, how to write them correctly, and how Koji's AI interviews take rating scales further by pairing quantitative scores with qualitative follow-up.

System Usability Scale (SUS): Complete Guide with Calculator, Benchmarks & Examples

The definitive 2026 guide to the System Usability Scale (SUS): the 10-question formula, scoring calculator, Sauro–Lewis benchmark grades, and how to deploy SUS at scale with AI-moderated interviews on Koji.

How to Conduct Usability Testing: The Complete Guide

A comprehensive guide to usability testing for UX researchers and product managers. Covers types of testing, participant numbers, step-by-step facilitation, and the most common mistakes to avoid.

How to Measure Customer Effort Score (CES) and Reduce Friction

The complete guide to Customer Effort Score surveys. Learn how to measure and reduce friction in customer interactions, and why low-effort experiences drive loyalty more than delight.