Desirability Testing & Microsoft Reaction Cards Guide (2026)

What Is Desirability Testing? (BLUF)

Desirability testing measures the emotional and aesthetic response a design, brand, or product evokes — the gut-level "do I want this?" that usability testing ignores. A usability test tells you whether people can use your interface; a desirability test tells you whether they like it and what feelings it triggers. The classic method, pioneered at Microsoft, hands participants a deck of ~118 adjective cards ("Trustworthy," "Confusing," "Innovative," "Slow") and asks them to pick the words that describe how a design makes them feel — then to explain why.

The "why" is the hard part. Picking words is fast; understanding the reasoning behind those words is where the insight lives — and where traditional facilitation gets expensive. With a platform like Koji, you run desirability testing as an AI-moderated interview that presents the reaction words as a multiple-choice question, then automatically probes each chosen word: "You called it 'overwhelming' — what specifically felt like too much?" You get the quantitative word-frequency data and the qualitative explanation in a single, unmoderated session.

Why Desirability Matters

People rationalize decisions, but they make them emotionally. Two products can be equally usable yet perform completely differently in the market because one feels trustworthy, modern, and effortless while the other feels clunky and generic. Desirability testing surfaces that gap early — before you ship, and while it''s still cheap to change.

Use it to:

Compare design directions — which of three landing-page concepts feels most "credible" to your target buyer?
Validate a rebrand — does the new identity read as "premium" or "cold"?
Catch emotional landmines — a checkout flow that tests as usable but feels "untrustworthy" will still leak conversions.
Differentiate from competitors — discover the adjectives you own vs. the ones rivals own.

Desirability pairs naturally with usability testing and concept testing: usability checks can they, concept checks do they get it, desirability checks do they want it.

The Microsoft Reaction Cards Method

Microsoft researchers Joey Benedek and Trish Miner introduced the Product Reaction Cards in 2002: a set of 118 adjectives, deliberately balanced at roughly 60% positive and 40% negative/neutral so participants aren''t nudged toward flattery.

The classic protocol:

Expose the participant to the design (a screenshot, prototype, live site, or brand mark).
Present the word set and ask them to choose the words that best describe their reaction. (Traditionally 3–5 words to force prioritization.)
Probe the top choices: "Why did you pick this word? What about the design made you feel that?"
Aggregate across participants to find the dominant adjectives and the outliers.

The output is a word-frequency map (often shown as a sorted list or word cloud) plus the verbatim reasoning behind each high-frequency term.

A starter reaction-word set

If 118 cards is too many for an unmoderated session, a curated 20–30 word subset works well. Balance positive and negative:

Trustworthy · Innovative · Clean · Professional · Friendly · Fast · Confusing · Overwhelming · Generic · Dated · Cluttered · Intimidating · Calm · Premium · Approachable · Boring · Cheap · Reliable · Playful · Confident · Slow · Cold

Running Desirability Testing at Scale with AI

The traditional bottleneck is moderation. Spreading physical or digital cards, recording selections, and interviewing each person about their choices is slow and doesn''t scale past a handful of sessions. Here''s how an AI-native workflow removes that ceiling:

1. Configure the reaction set as a structured question

Create a multiple_choice question listing your reaction words, capped to 3–5 selections. Koji supports six structured question types — open_ended, scale, single_choice, multiple_choice, ranking, and yes_no — so you can combine the word picker with a 1–10 desirability scale ("How appealing is this design to you?") in the same flow.

2. Let the AI probe every selection

This is the differentiator. After a participant picks "Premium" and "Confusing," Koji''s AI interviewer follows up on each: "You picked 'premium' — which part of the design gave you that impression?" and "And 'confusing' — where did you get lost?" No moderator required, available 24/7, in voice or text.

3. Aggregate automatically

Because each reaction word is a structured value, Koji produces a frequency distribution across all participants automatically — while thematic analysis clusters the reasons behind each word into coded themes. You see not just that 14 of 20 people said "trustworthy," but the three recurring reasons they gave.

4. Compare variants

Run the same desirability test against Design A and Design B as separate studies, then compare their adjective profiles side by side. This turns a subjective "I think B looks better" debate into evidence.

How Many Participants?

Desirability testing tolerates small samples for directional reads — 15–25 participants per design usually reveals the dominant adjectives. For statistically comparable word frequencies between variants, push toward 40–50 per cell. Because AI interviews are unmoderated and run in parallel, scaling to those numbers costs you setup time, not facilitation time.

Common Pitfalls

Letting people pick too many words. Unlimited selection floods you with mild positives. Cap at 3–5 to force genuine prioritization.
Skipping the "why." The word frequencies are interesting; the reasoning is actionable. Always probe. (This is exactly the step manual studies cut for time — and AI moderation restores.)
An all-positive word set. If every adjective flatters, you''ll learn nothing. Keep the ~60/40 positive-to-critical balance.
Testing in isolation. Pair desirability with first-click testing or usability testing so emotional and functional findings inform each other.
Ignoring your target segment. "Premium" from a bargain shopper and "premium" from your actual ICP mean different things — screen participants accordingly.

Desirability vs. Related Methods

Method	Question it answers
Desirability testing	How does this make you feel?
Usability testing	Can you use it?
Concept testing	Do you understand and value the idea?
Preference testing	Which option do you prefer?
Five-second test	What''s the first impression?

Related Resources

Structured Questions in AI Interviews — build the reaction-word picker with multiple_choice + scale
Usability Testing Guide — pair "can they" with "do they want it"
Concept Testing Methodology — validate the idea, not just the look
Preference Testing Guide — A/B your design directions
The 5-Second Test — capture first impressions
Scale Questions in AI Interviews — quantify the desirability rating

Product & Research

People & Marketing

Partners & Education

Desirability Testing: Measuring How a Design Makes Users Feel

What Is Desirability Testing? (BLUF)

Why Desirability Matters

The Microsoft Reaction Cards Method

A starter reaction-word set

Running Desirability Testing at Scale with AI

1. Configure the reaction set as a structured question

2. Let the AI probe every selection

3. Aggregate automatically

4. Compare variants

How Many Participants?

Common Pitfalls

Desirability vs. Related Methods

Related Resources

Related Articles

The 5-Second Test: How to Measure First Impressions and Visual Hierarchy (2026 Guide)

Concept Testing: The Complete Methodology Guide

Preference Testing: The Complete Guide to Validating Design Choices (2026)

Scale Questions in AI Interviews: Measure NPS, CSAT, and Ratings Automatically

Structured Questions in AI Interviews

How to Conduct Usability Testing: The Complete Guide