Desirability Testing: Measuring How a Design Makes Users Feel
A complete guide to desirability testing and Microsoft Reaction Cards — how to measure the emotional response to a design, and how to run it at scale with AI interviews.
What Is Desirability Testing? (BLUF)
Desirability testing measures the emotional and aesthetic response a design, brand, or product evokes — the gut-level "do I want this?" that usability testing ignores. A usability test tells you whether people can use your interface; a desirability test tells you whether they like it and what feelings it triggers. The classic method, pioneered at Microsoft, hands participants a deck of ~118 adjective cards ("Trustworthy," "Confusing," "Innovative," "Slow") and asks them to pick the words that describe how a design makes them feel — then to explain why.
The "why" is the hard part. Picking words is fast; understanding the reasoning behind those words is where the insight lives — and where traditional facilitation gets expensive. With a platform like Koji, you run desirability testing as an AI-moderated interview that presents the reaction words as a multiple-choice question, then automatically probes each chosen word: "You called it 'overwhelming' — what specifically felt like too much?" You get the quantitative word-frequency data and the qualitative explanation in a single, unmoderated session.
Why Desirability Matters
People rationalize decisions, but they make them emotionally. Two products can be equally usable yet perform completely differently in the market because one feels trustworthy, modern, and effortless while the other feels clunky and generic. Desirability testing surfaces that gap early — before you ship, and while it''s still cheap to change.
Use it to:
- Compare design directions — which of three landing-page concepts feels most "credible" to your target buyer?
- Validate a rebrand — does the new identity read as "premium" or "cold"?
- Catch emotional landmines — a checkout flow that tests as usable but feels "untrustworthy" will still leak conversions.
- Differentiate from competitors — discover the adjectives you own vs. the ones rivals own.
Desirability pairs naturally with usability testing and concept testing: usability checks can they, concept checks do they get it, desirability checks do they want it.
The Microsoft Reaction Cards Method
Microsoft researchers Joey Benedek and Trish Miner introduced the Product Reaction Cards in 2002: a set of 118 adjectives, deliberately balanced at roughly 60% positive and 40% negative/neutral so participants aren''t nudged toward flattery.
The classic protocol:
- Expose the participant to the design (a screenshot, prototype, live site, or brand mark).
- Present the word set and ask them to choose the words that best describe their reaction. (Traditionally 3–5 words to force prioritization.)
- Probe the top choices: "Why did you pick this word? What about the design made you feel that?"
- Aggregate across participants to find the dominant adjectives and the outliers.
The output is a word-frequency map (often shown as a sorted list or word cloud) plus the verbatim reasoning behind each high-frequency term.
A starter reaction-word set
If 118 cards is too many for an unmoderated session, a curated 20–30 word subset works well. Balance positive and negative:
Trustworthy · Innovative · Clean · Professional · Friendly · Fast · Confusing · Overwhelming · Generic · Dated · Cluttered · Intimidating · Calm · Premium · Approachable · Boring · Cheap · Reliable · Playful · Confident · Slow · Cold
Running Desirability Testing at Scale with AI
The traditional bottleneck is moderation. Spreading physical or digital cards, recording selections, and interviewing each person about their choices is slow and doesn''t scale past a handful of sessions. Here''s how an AI-native workflow removes that ceiling:
1. Configure the reaction set as a structured question
Create a multiple_choice question listing your reaction words, capped to 3–5 selections. Koji supports six structured question types — open_ended, scale, single_choice, multiple_choice, ranking, and yes_no — so you can combine the word picker with a 1–10 desirability scale ("How appealing is this design to you?") in the same flow.
2. Let the AI probe every selection
This is the differentiator. After a participant picks "Premium" and "Confusing," Koji''s AI interviewer follows up on each: "You picked 'premium' — which part of the design gave you that impression?" and "And 'confusing' — where did you get lost?" No moderator required, available 24/7, in voice or text.
3. Aggregate automatically
Because each reaction word is a structured value, Koji produces a frequency distribution across all participants automatically — while thematic analysis clusters the reasons behind each word into coded themes. You see not just that 14 of 20 people said "trustworthy," but the three recurring reasons they gave.
4. Compare variants
Run the same desirability test against Design A and Design B as separate studies, then compare their adjective profiles side by side. This turns a subjective "I think B looks better" debate into evidence.
How Many Participants?
Desirability testing tolerates small samples for directional reads — 15–25 participants per design usually reveals the dominant adjectives. For statistically comparable word frequencies between variants, push toward 40–50 per cell. Because AI interviews are unmoderated and run in parallel, scaling to those numbers costs you setup time, not facilitation time.
Common Pitfalls
- Letting people pick too many words. Unlimited selection floods you with mild positives. Cap at 3–5 to force genuine prioritization.
- Skipping the "why." The word frequencies are interesting; the reasoning is actionable. Always probe. (This is exactly the step manual studies cut for time — and AI moderation restores.)
- An all-positive word set. If every adjective flatters, you''ll learn nothing. Keep the ~60/40 positive-to-critical balance.
- Testing in isolation. Pair desirability with first-click testing or usability testing so emotional and functional findings inform each other.
- Ignoring your target segment. "Premium" from a bargain shopper and "premium" from your actual ICP mean different things — screen participants accordingly.
Desirability vs. Related Methods
| Method | Question it answers |
|---|---|
| Desirability testing | How does this make you feel? |
| Usability testing | Can you use it? |
| Concept testing | Do you understand and value the idea? |
| Preference testing | Which option do you prefer? |
| Five-second test | What''s the first impression? |
Related Resources
- Structured Questions in AI Interviews — build the reaction-word picker with multiple_choice + scale
- Usability Testing Guide — pair "can they" with "do they want it"
- Concept Testing Methodology — validate the idea, not just the look
- Preference Testing Guide — A/B your design directions
- The 5-Second Test — capture first impressions
- Scale Questions in AI Interviews — quantify the desirability rating
Related Articles
Scale Questions in AI Interviews: Measure NPS, CSAT, and Ratings Automatically
Learn how to configure and use scale questions in Koji AI interviews to capture NPS, CSAT, and satisfaction ratings — with automatic probing and aggregated distribution charts in your research report.
Structured Questions in AI Interviews
Mix quantitative data collection — scales, ratings, multiple choice, ranking — with AI-powered conversational follow-up in a single interview.
Concept Testing: The Complete Methodology Guide
How to evaluate product and marketing ideas with target audiences before development — covering methods, metrics, sample sizes, and AI-powered approaches.
The 5-Second Test: How to Measure First Impressions and Visual Hierarchy (2026 Guide)
A complete guide to the 5-second test — the lightweight UX research method that measures gut reactions, message clarity, and visual hierarchy. Learn how to design questions, recruit participants, analyze results, and combine 5-second tests with AI interviews.
Preference Testing: The Complete Guide to Validating Design Choices (2026)
A complete guide to preference testing in UX research — when to use it, how to write the questions, how to calculate sample size, how to analyze the results, and how AI-native research with Koji turns binary "A or B" votes into qualitative insight in minutes.
How to Conduct Usability Testing: The Complete Guide
A comprehensive guide to usability testing for UX researchers and product managers. Covers types of testing, participant numbers, step-by-step facilitation, and the most common mistakes to avoid.