How to Run Usability Testing Surveys That Improve Your Product
The complete guide to usability testing surveys and post-task questionnaires. Learn how to combine SUS scores, task success rates, and conversational feedback to identify exactly where your UX breaks down.
How to Run Usability Testing Surveys That Improve Your Product
Usability testing tells you whether people can use your product. Usability surveys tell you how they felt while using it. The most powerful approach combines both: observe behavior, then have a conversation about the experience.
Traditional usability studies require a facilitator, a lab (or Zoom), scheduling, recording, and hours of analysis per participant. They're valuable but expensive. Most teams can only afford 5-8 participants. That's not enough to detect patterns reliably.
Koji enables a new paradigm: unmoderated usability conversations at scale. Give participants a task, then let Koji's AI interview them about their experience. You get the behavioral data (task success, time-on-task) combined with rich qualitative feedback explaining why they struggled or succeeded.
Core Usability Metrics
System Usability Scale (SUS)
The industry standard 10-question post-test questionnaire. Produces a 0-100 score:
- Below 50: Unacceptable usability
- 50-70: Marginal
- 70-80: Good
- 80-90: Excellent
- Above 90: Best-in-class
Task Success Rate
Percentage of participants who complete a given task successfully. Below 78% indicates a significant usability problem.
Time on Task
How long each task takes. Compare against benchmarks or expert completion times.
Single Ease Question (SEQ)
"How easy was this task?" on a 1-7 scale. Quick, per-task metric. Below 5.5 indicates difficulty.
Building Usability Studies with Koji
Study Design
Pre-Task Context (Open-ended): "Before we start, can you tell me about your experience with [product category]? How often do you use similar tools?"
- Sets baseline familiarity
- Probing depth: 1
Per-Task Block (repeat for each task):
Task Description: "Please try to [specific task]. When you're done (or if you get stuck), come back here and we'll talk about it."
SEQ Score (Scale, 1-7): "How easy or difficult was that task?"
- Labels: 1 = "Very difficult", 7 = "Very easy"
- Anchor probing: Scores below 5 trigger deeper exploration
Task Experience (Open-ended): "Walk me through what you did. What was your experience like?"
- Probing depth: 3 for low SEQ scores, 1 for high scores
- AI instruction: "Focus on specific moments of confusion, delight, or frustration. Get step-by-step walkthroughs of where they got stuck."
Expectations (Open-ended): "Was there anything that didn't work the way you expected?"
- Probing depth: 2
- Captures mental model mismatches
Post-Test Overall:
SUS Score (10 Scale questions): Administer the standard 10 SUS questions as Scale type (1-5 each). Koji can ask these conversationally rather than as a form.
Overall Impression (Open-ended): "Overall, what was your impression of [product]?"
- Probing depth: 2
Comparison (Open-ended): "How does this compare to other tools you've used for [similar purpose]?"
- Probing depth: 1
Multi-Task Studies
For studies with 3-5 tasks, Koji maintains conversational flow between tasks. The AI naturally transitions: "Great, thanks for sharing that. Let's try another task..." This feels more natural than a form that resets after each task.
Analysis
What Koji Reports Generate
- SUS score calculation with industry benchmark comparison
- Per-task SEQ distribution identifying which tasks cause the most difficulty
- Usability problem clustering grouping similar issues across participants
- Severity rating based on frequency and impact (how many people hit this issue, and how much it blocked them)
- Improvement recommendations prioritized by severity
- Verbatim quotes describing specific UX breakdowns
Turning Results Into Action
- Fix critical blockers first: Tasks with less than 60% success rate or SEQ below 4
- Address confusion patterns: Where multiple users misinterpret the same element
- Validate fixes: Re-run the same tasks after changes to confirm improvement
- Track SUS trend: Measure quarterly to ensure overall usability is improving
Usability Testing Best Practices
Sample size
- Qualitative patterns: 5-8 participants per user segment reveal ~85% of usability problems
- Quantitative confidence: 20+ participants for statistically reliable SUS scores
- Koji makes larger samples affordable, which means you can do both: deep qualitative AND reliable quantitative
Participant selection
- Recruit from your actual user base, not generic testers
- Include a mix of experience levels (new users + power users)
- Match demographics to your actual user population
- Avoid UX professionals (they test differently than real users)
Task design
- Write tasks as goals, not instructions: "Find a product under $50" not "Click the filter button"
- Include 3-5 representative tasks covering core workflows
- Order from easy to hard to build participant confidence
- Include at least one task you expect to be difficult (this is where the real learning happens)
Avoid leading
- Don't tell participants where to click or what to look for
- Don't ask "Did you find the navigation easy?" (leading). Ask "Tell me about the navigation" (neutral).
- Koji's AI is trained to probe without leading, asking "What did you expect to happen?" rather than "Was it confusing?"
Why Koji Beats Traditional Usability Testing Tools
| Feature | Traditional (UserTesting, Maze) | Koji |
|---|---|---|
| Format | Task recording + survey form | Task + AI conversation |
| Depth per participant | Surface (click data + text box) | Deep (conversational probing) |
| Cost per participant | $30-100+ | Less than $1 |
| Scale | 5-10 typical | 50-200 affordable |
| Analysis | Manual video review | Automated theme extraction |
| Turnaround | 1-2 weeks | Hours |
| Follow-up questions | Not possible | AI probes in real-time |
| Quant + Qual | Separate tools | Single integrated flow |
Koji doesn't replace screen recording for observing behavior. But it dramatically improves the post-task interview that reveals why users behaved the way they did, and it does so at a scale that makes statistically reliable usability data accessible to every team.
Related Articles
How to Measure Product-Market Fit with the Sean Ellis Test (and Go Deeper)
The complete guide to measuring product-market fit. Learn how to run the Sean Ellis "very disappointed" test, combine it with qualitative interviews, and use Koji to understand not just whether you have PMF but why.
How to Build an Onboarding Survey That Reduces Time-to-Value
The complete guide to user onboarding surveys and experience feedback. Learn how to identify friction points, measure activation milestones, and optimize the first-run experience using Koji's conversational feedback.
How to Run Feature Prioritization Surveys That Build Products Users Actually Want
Learn how to run feature prioritization surveys using RICE, Kano, MoSCoW, and opportunity scoring frameworks. Combine quantitative ranking with AI-driven qualitative depth to build what users truly need.
How to Collect Beta Testing Feedback That Ships Better Products
Learn how to design beta testing feedback surveys that catch bugs, validate features, and gather early adopter insights. Combine structured SUS scoring with conversational AI follow-up for richer beta data.
How to Run Concept Testing Surveys Before You Build the Wrong Thing
Learn how to run concept testing surveys using monadic and sequential designs, concept scoring frameworks, and purchase intent scales. Use AI-driven interviews to uncover hidden objections before you invest.
How to Build an NPS Survey That Actually Drives Action
A comprehensive guide to designing, deploying, and acting on Net Promoter Score surveys. Learn the best practices that separate vanity metrics from actionable insights, and how Koji's conversational approach unlocks the "why" behind every score.