{"site":{"name":"Koji","description":"AI-native customer research platform that helps teams conduct, analyze, and synthesize customer interviews at scale.","url":"https://www.koji.so","contentTypes":["blog","documentation"],"lastUpdated":"2026-05-04T17:29:47.778Z"},"content":[{"type":"documentation","id":"5a625969-d31b-459e-8423-3063102d0a8f","slug":"maxdiff-analysis-guide","title":"MaxDiff Analysis: The Complete Guide to Maximum Difference Scaling (2026)","url":"https://www.koji.so/docs/maxdiff-analysis-guide","summary":"MaxDiff (Maximum Difference Scaling) is a forced-choice quantitative method for ranking 8-30 items by relative importance. It eliminates rating-scale bias by making respondents pick \"most\" and \"least\" preferred from small sets. Standard sample size is 200+ for aggregate utilities, 400+ for individual-level Hierarchical Bayes scoring. Pair MaxDiff with conversational AI interviews to capture the qualitative reasoning that pure quantitative MaxDiff cannot.","content":"# MaxDiff Analysis: The Complete Guide to Maximum Difference Scaling (2026)\n\n**MaxDiff (Maximum Difference Scaling) is a quantitative research method that asks respondents to pick the \"most\" and \"least\" important item from small groups of options. Unlike rating scales, MaxDiff forces trade-offs — eliminating the \"everything is important\" bias that ruins traditional priority surveys. The result is a discriminating, projectable preference ranking that tells you exactly which features, messages, or attributes drive customer decisions.**\n\nIf you have ever asked customers to rate 15 features on a 1-5 scale and gotten back a wall of 4s and 5s, you have experienced the problem MaxDiff was invented to solve. By forcing respondents to choose between options instead of rating them in isolation, MaxDiff produces sharper, more decision-ready data — and modern AI research platforms make it dramatically easier to run.\n\nThis guide walks you through how MaxDiff works, when to use it, how to design a study, and how to combine it with conversational AI interviews to understand not just *what* customers prefer, but *why*.\n\n---\n\n## What Is MaxDiff Analysis?\n\nMaxDiff analysis (also called Best-Worst Scaling, or BWS) is a discrete-choice method developed by Jordan Louviere in 1987. It presents respondents with a series of small sets — typically 3-5 items per set — and asks them to identify the most preferred and least preferred item in each set.\n\nAcross multiple sets, every item appears several times in different combinations. By aggregating these forced choices, MaxDiff produces a utility score for each item on a common scale, showing the relative importance of options far more accurately than direct ratings.\n\n**The math behind it:** MaxDiff uses a multinomial logit model (or hierarchical Bayesian estimation for individual-level scores) to calculate the probability that any item would be selected as \"best\" if shown alongside any other item. The output is a 0-100 importance score where the average item scores around 100/N (where N is the total number of items being tested).\n\n### Why MaxDiff beats rating scales\n\nTraditional rating scales suffer from three well-documented problems that MaxDiff eliminates:\n\n- **Scale-use bias.** Some respondents rate everything 4-5 (\"yea-sayers\"), others rate everything 2-3 (\"conservative raters\"). MaxDiff is bias-free because it is relative.\n- **Cultural response bias.** Cross-cultural research (Steenkamp & Baumgartner, 1998) shows ratings vary systematically by country. Forced choice does not.\n- **Insufficient discrimination.** When 12 of 15 items rate ≥4.0, you cannot prioritize. MaxDiff produces a clear rank order with statistically significant gaps between items.\n\nA 2023 meta-analysis published in the International Journal of Market Research found MaxDiff has roughly **3x the predictive validity** of standard rating scales for purchase intent and feature prioritization decisions.\n\n---\n\n## When to Use MaxDiff (and When Not To)\n\n**Use MaxDiff when you need to:**\n\n- Prioritize 8-30 features for a product roadmap\n- Test which value propositions or marketing messages resonate most\n- Rank package contents (e.g., which 5 features should bundle into a Pro tier?)\n- Compare brand attributes (which brand associations matter most?)\n- Validate which benefits drive purchase decisions\n\n**Skip MaxDiff when:**\n\n- You are testing fewer than 6 items — a simple ranking question is enough\n- You are testing more than 30 items — break into separate studies or use a screening MaxDiff first\n- You need to understand *trade-offs between attribute levels* (use conjoint analysis instead — see our [conjoint analysis guide](/docs/conjoint-analysis-guide))\n- Your sample is below 100 respondents — utility estimates need volume to stabilize\n- You need contextual understanding of *why* — pair MaxDiff with qualitative AI interviews\n\n---\n\n## How to Run a MaxDiff Study (5 Steps)\n\n### Step 1 — Define the item list\n\nThe list should be exhaustive and parallel. Items must:\n\n- Cover all realistic options in the decision space\n- Be at the same conceptual level (do not mix \"Faster checkout\" with \"Better pricing strategy\")\n- Be roughly equal in scope and specificity\n- Be mutually distinct (avoid two items that mean the same thing)\n\nA typical study tests 12-25 items. Below 8 you do not need MaxDiff; above 30 the design becomes burdensome for respondents.\n\n### Step 2 — Design the choice sets\n\nA standard MaxDiff design shows 4 items per set across 8-15 sets (depending on item count). Each item should appear at least 3 times to produce reliable estimates. Use a **balanced incomplete block design (BIBD)** — most modern research platforms generate this automatically.\n\nFor 16 items at 4-per-set with each item shown 3 times, you will show respondents 12 sets, taking about 4-6 minutes to complete.\n\n### Step 3 — Collect responses\n\nRecruit a representative sample of your target population. MaxDiff requires:\n\n- **Minimum sample size: 200 respondents** for aggregate utilities\n- **Recommended: 300-400** for sub-group analysis\n- **For HB (Hierarchical Bayes) individual-level scoring: 400+**\n\nQuality screening matters more than raw count — speeders and straightliners corrupt utility estimates. Modern AI-moderated interview platforms automatically detect low-quality responses and exclude them from analysis.\n\n### Step 4 — Calculate utility scores\n\nMost analysis tools default to one of three estimation approaches:\n\n- **Counts analysis:** Simple math — (times chosen as best − times chosen as worst) / appearances. Quick and intuitive.\n- **Aggregate logit:** Maximum likelihood estimation across the whole sample. Standard for most studies.\n- **Hierarchical Bayes (HB):** Individual-level utilities. Required for segmentation, simulation, and sub-group comparisons.\n\nOutput utilities are typically rescaled to sum to 100 across all items, so a score of 12.5 on a 16-item study means the item is exactly at the average. Items scoring 25+ are 2x more important than average.\n\n### Step 5 — Interpret and act\n\nDo not just report the top-3. The most actionable MaxDiff outputs are:\n\n- **The top quartile** — features/messages worth investing in\n- **The bottom quartile** — items to deprioritize or remove\n- **The discrimination gap** — large gaps between adjacent items signal stable priorities; small gaps mean the priority order is fragile\n- **Sub-group splits** — utilities by segment often reveal that \"average preference\" hides two opposing camps\n\n---\n\n## MaxDiff vs. Other Prioritization Methods\n\n| Method | Best for | Sample needed | Drawback |\n|---|---|---|---|\n| **MaxDiff** | 8-30 items, projectable prioritization | 200+ | Does not capture trade-offs between attribute levels |\n| **Conjoint** | Trade-offs between bundles of attributes | 300+ | Complex setup, longer surveys |\n| **Kano** | Feature categorization (must-have vs delighter) | 100+ | Does not rank features against each other |\n| **Simple Ranking** | Fewer than 8 items, quick read | 50+ | Cognitive load increases sharply above 7 items |\n| **Rating Scale** | Quick directional read | 50+ | Scale-use bias; poor discrimination |\n\nFor most product and marketing teams, MaxDiff is the right choice when you have a list of 10-25 things to prioritize and need defensible numbers to bring into a roadmap or messaging meeting.\n\n---\n\n## How Koji Makes MaxDiff Easier (and Smarter)\n\nTraditional MaxDiff studies require a survey platform, a separate analytics tool, and often a research consultant to design the experiment correctly. Koji collapses this into a single AI-native workflow.\n\nKoji supports six **structured question types** — open_ended, scale, single_choice, multiple_choice, ranking, and yes_no — that work in both voice and text interviews. The **ranking question type** powers MaxDiff-style prioritization: respondents drag items into preference order, and Koji's AI follows up with a probing question on each top and bottom choice to capture the *reasoning* behind the score.\n\nThis hybrid approach solves MaxDiff's most common weakness: numbers without context. Instead of just learning that \"Faster onboarding\" scored 22.5 utility points, you also get:\n\n- Which onboarding friction caused respondents to rank it #1\n- Whether the priority is universal or driven by a specific persona\n- What \"faster\" means to different segments — a 30-minute reduction or sub-5-minute total?\n\nKoji's automatic analysis aggregates ranking results across hundreds of conversations in minutes, produces utility scores, and surfaces the qualitative themes behind every preference. A traditional MaxDiff study that takes 4-6 weeks (design, fieldwork, analysis, reporting) collapses to 48-72 hours with Koji — and includes the *why* that traditional MaxDiff cannot capture.\n\nFor teams running pricing research, Koji also pairs naturally with the [Van Westendorp Price Sensitivity Meter](/docs/van-westendorp-price-sensitivity-meter) and other [pricing research methods](/docs/pricing-research-survey-guide).\n\n---\n\n## MaxDiff Best Practices\n\n- **Pre-test your item list.** Run 5-10 qualitative interviews first to make sure your items reflect how customers actually think — not internal feature names.\n- **Keep wording parallel.** Each item should start the same way (\"Ability to...\", \"Faster...\" etc.) to avoid framing effects.\n- **Watch your sample size.** For HB or segmentation analysis, plan for 400+ respondents.\n- **Use anchored MaxDiff for absolute importance.** Standard MaxDiff produces relative importance — anchored MaxDiff adds a \"are these even important to you?\" gate to identify items everyone considers irrelevant.\n- **Pair with qualitative.** Numbers tell you *what* — conversational follow-up tells you *why*. Koji does both in one study.\n\n---\n\n## Common Mistakes to Avoid\n\n1. **Testing too few items.** Below 8, simple ranking is more efficient.\n2. **Mixing item types.** Do not put \"improved performance\" alongside \"lower price\" alongside \"more colors\" — they live at different levels of abstraction.\n3. **Skipping the qualitative layer.** Knowing the rank without the reason ships features that score well on paper but fail in market.\n4. **Using aggregate utilities for segment decisions.** Individual-level (HB) utilities reveal sub-group differences that aggregate scores hide.\n5. **Treating utilities as fixed truth.** Preferences shift with context. Re-test annually for high-stakes decisions.\n\n---\n\n## When MaxDiff Tells You to Pivot\n\nThe most valuable MaxDiff results are surprises — when the feature your team has been investing in scores in the bottom quartile, or when a \"nice to have\" turns out to be a top driver of preference. These are exactly the moments product teams need defensible quantitative evidence to override conviction. MaxDiff plus Koji's qualitative AI interviews give you both the score *and* the story to bring to your roadmap conversation.\n\n---\n\n## Related Resources\n\n- [Structured Questions in AI Interviews](/docs/structured-questions-guide) — How Koji's six question types power both qualitative and quantitative research in a single conversation\n- [Conjoint Analysis: The Complete Guide](/docs/conjoint-analysis-guide) — When you need to test trade-offs between attribute bundles, not single items\n- [Choice and Ranking Questions in AI Interviews](/docs/choice-ranking-questions-guide) — How Koji's ranking question type powers MaxDiff studies\n- [Kano Model](/docs/kano-model) — Categorize features as must-haves, performance drivers, and delighters\n- [Feature Prioritization Survey Guide](/docs/feature-prioritization-survey-guide) — A full template for prioritizing your roadmap with research\n- [Van Westendorp Price Sensitivity Meter](/docs/van-westendorp-price-sensitivity-meter) — The four-question pricing research method that pairs well with MaxDiff\n- [Best User Research Tools in 2026](/docs/best-user-research-tools-2026) — How modern AI-native platforms compare to traditional MaxDiff tools\n","category":"Research Methods","lastModified":"2026-05-04T03:24:40.265487+00:00","metaTitle":"MaxDiff Analysis: The Complete 2026 Guide","metaDescription":"How MaxDiff produces sharper feature and message prioritization than rating scales — plus how AI interviews capture the why behind every score.","keywords":["maxdiff analysis","maximum difference scaling","best-worst scaling","feature prioritization","message prioritization","conjoint vs maxdiff","MaxDiff utility scores","MaxDiff sample size","prioritization research"],"aiSummary":"MaxDiff (Maximum Difference Scaling) is a forced-choice quantitative method for ranking 8-30 items by relative importance. It eliminates rating-scale bias by making respondents pick \"most\" and \"least\" preferred from small sets. Standard sample size is 200+ for aggregate utilities, 400+ for individual-level Hierarchical Bayes scoring. Pair MaxDiff with conversational AI interviews to capture the qualitative reasoning that pure quantitative MaxDiff cannot.","aiPrerequisites":["Basic familiarity with quantitative survey research","Understanding of feature/message prioritization decisions"],"aiLearningOutcomes":["Understand what MaxDiff is and how utility scores are calculated","Know when to use MaxDiff vs conjoint, Kano, or simple ranking","Design a MaxDiff study with the right item list and choice set structure","Calculate the right sample size for aggregate or individual-level analysis","Combine MaxDiff with conversational AI interviews for the qualitative why"],"aiDifficulty":"intermediate","aiEstimatedTime":"15 minutes"}],"pagination":{"total":1,"returned":1,"offset":0}}