{"site":{"name":"Koji","description":"AI-native customer research platform that helps teams conduct, analyze, and synthesize customer interviews at scale.","url":"https://www.koji.so","contentTypes":["blog","documentation"],"lastUpdated":"2026-05-04T17:17:17.914Z"},"content":[{"type":"documentation","id":"562fc2f0-91b6-467c-a856-b2c22dde23b1","slug":"synthetic-users-research-methodology","title":"Synthetic Users in Research: Validity, Bias, and When AI Personas Are (and Aren't) Trustworthy","url":"https://www.koji.so/docs/synthetic-users-research-methodology","summary":"Methodology guide to synthetic users (AI-generated research participants). Documents the validity problems — NN/G's shallow-insights critique, ACM Interactions' synthetic persona fallacy, 48% coefficient mismatch and 32% sign-flipping in quantitative comparisons, and the sycophancy bias driven by RLHF training. Outlines legitimate adjunct uses (vocabulary familiarization, hypothesis generation, pre-interview rehearsal) and explains how AI-moderated interviews on real participants now match synthetic-speed economics, removing the original cost-speed argument for synthetic-as-substitute.","content":"# Synthetic Users in Research: Validity, Bias, and When AI Personas Are (and Aren't) Trustworthy\n\n**TL;DR (Answer-First):** Synthetic users — AI-generated personas that respond to research prompts as if they were real participants — are useful for hypothesis generation, vocabulary familiarization, and pre-interview rehearsal. They are **not** valid substitutes for real respondents in concept testing, prioritization, persona creation, or any decision that depends on observed behavior. Documented bias problems include systematic sycophancy, sign-flipping on causal effects, and a tendency to enthusiastically endorse every concept. The 2026 best practice is to use synthetic users for early-stage exploration and route every actual decision through real participant data — which AI-moderated interview platforms like Koji now make possible at synthetic-speed economics.\n\n## What Synthetic Users Are\n\nA synthetic user is a large language model (LLM) prompted to behave as if it were a real research participant — answering interview questions, reacting to concepts, completing surveys. Instead of recruiting humans, you instantiate a \"persona\" (e.g., *\"a 34-year-old urban parent shopping for car insurance\"*) and let the model produce responses.\n\nThe pitch is compelling: hundreds of \"interviews\" in minutes, no recruiting, no incentives, no scheduling, no transcription. Several startups (Synthetic Users, Userdoc, others) sell this as a research replacement.\n\nThe problem is that the responses look like research without being it.\n\n## What the Evidence Says (And Why Researchers Are Worried)\n\n### NN/G: \"Too Shallow to Be Useful\"\n\nNielsen Norman Group ran direct comparisons between synthetic users and real participants on the same studies. Their conclusion was unambiguous: *\"Synthetic-user responses for many research activities are too shallow to be useful\"* ([NN/G, \"Synthetic Users: If, When, and How to Use AI-Generated Research\"](https://www.nngroup.com/articles/synthetic-users/)).\n\nKey observations from NN/G's comparative work:\n\n- **Sycophancy at scale.** Synthetic users praised every concept. Real users balanced interest with concerns and questioned feasibility. For concept testing — where the entire point is finding which ideas users reject — this makes synthetic responses worse than useless; they generate confident false positives.\n- **Behavioral mismatch.** In one study, synthetic users enthusiastically described participating in user forums, while real users mostly avoided forums and called them *\"contrived.\"* The AI was predicting idealized behavior from training data, not actual behavior.\n- **Uniform care.** Synthetic users seemed to care about everything roughly equally — fatal for feature prioritization, where the point is to find what matters most.\n\n### ACM Interactions: The Synthetic Persona Fallacy\n\nThe January-February 2026 issue of *ACM Interactions* published \"The Synthetic Persona Fallacy: How AI-Generated Research Undermines UX Research\" ([ACM Interactions](https://interactions.acm.org/blog/view/the-synthetic-persona-fallacy-how-ai-generated-research-undermines-ux-research)), arguing that synthetic personas erode the discipline's evidentiary base. The core claim: a synthetic user cannot be falsified. Real users surprise you. Synthetic users return your assumptions in dialogue form.\n\n### The Sign-Flipping Problem\n\nThe most damaging empirical finding comes from quantitative comparisons. In a study comparing ChatGPT-generated survey responses to real human survey data, **48% of estimated coefficients were statistically significantly different from their human counterparts. Of those, the sign of the effect flipped 32% of the time** ([referenced via Radical Product Thinking, \"Synthetic Users vs Real User Research\"](https://www.radicalproduct.com/blog/synthetic-users-user-research)).\n\nIn plain language: roughly one in three \"findings\" from synthetic data didn't just have the wrong magnitude — they pointed in the wrong direction entirely. Building a roadmap on that data is worse than building a roadmap on no data, because you have false confidence in claims that are actively misleading.\n\n### B2B Sycophancy\n\nA comparative study by Emporia Research between three groups of B2B survey respondents found that synthetic users showed **strong positive bias and \"herd mentality\"** compared to real B2B respondents. The synthetic group agreed with each other more than real respondents do, and rated everything more favorably. For B2B research — where the gating problem is usually distinguishing real intent from polite agreement — synthetic users amplify the exact bias you're trying to avoid.\n\n## Why Synthetic Users Fail in This Specific Way\n\nThe failure modes aren't random. They flow from how LLMs are trained.\n\n1. **RLHF tunes models toward agreement.** Reinforcement learning from human feedback rewards responses humans rate highly. Humans tend to rate friendly, supportive, agreeable responses higher. The model learns to please.\n2. **Training data is text, not behavior.** Models are trained on what people *say* online — not on what they *do* in product. They learn idealized behavior from blogs, reviews, and forums, which is systematically different from observed behavior.\n3. **No uncertainty or contradiction.** Real respondents say \"I don't know,\" contradict themselves between minute 5 and minute 25, change their minds, get confused, push back on the question. Models smooth all of this into coherent narrative.\n4. **No surprise.** Synthetic users can only recombine patterns from training data. They cannot tell you something the dataset didn't already contain. Real research's highest value is the *unknown unknown* — and synthetic users systematically cannot deliver it.\n\n## When Synthetic Users Are Legitimate\n\nThe critique above is about using synthetic users as a *replacement* for real research. Used as an *adjunct*, they have real value:\n\n1. **Vocabulary familiarization.** Before interviewing dental practice managers, prompt a synthetic dental practice manager and absorb the jargon. You walk into real interviews sounding like you belong there.\n2. **Hypothesis generation.** Generate 20 candidate concerns the persona might raise. Treat the list as **hypotheses to test**, not findings.\n3. **Pre-interview rehearsal.** Run your discussion guide through a synthetic respondent. Spot questions that go nowhere. Refine before you spend a real participant's time.\n4. **Stress-testing materials.** Have synthetic users react to a survey draft. Use the reactions to spot ambiguous wording — but don't use the response distribution as data.\n5. **Long-tail edge personas you can't recruit.** If your real population is genuinely impossible to reach (e.g., very rare medical condition), synthetic exploration can scaffold an early hypothesis — provided you label it that way.\n\nThe pattern: **synthetic users generate questions; real users generate answers.** Inverting that pattern is where teams get into trouble.\n\n## The Honest Decision Tree\n\nUse this when deciding whether a research question can be answered synthetically:\n\n- **Are you making a product decision based on the result?** → Real users. Period.\n- **Are you prioritizing features?** → Real users. Synthetic sycophancy ruins prioritization.\n- **Are you testing a concept's appeal?** → Real users. Synthetic positivity distorts every concept upward.\n- **Are you building a persona that downstream teams will treat as a real customer?** → Real users.\n- **Are you exploring vocabulary, generating hypotheses, or rehearsing materials?** → Synthetic is fine, labeled as such.\n- **Are you doing PR or marketing the research as \"user research\"?** → Real users — anything else is misrepresentation.\n\n## The Argument That Synthetic Users Used to Win — and No Longer Does\n\nThe original case for synthetic users rested on cost and speed. Real research is expensive and slow; synthetic research is cheap and fast. Therefore, the argument went, synthetic was a defensible compromise for early-stage work.\n\nThat argument was strongest in 2023 when running 30 real moderated interviews took 4-6 weeks and cost $5-15K in incentives and panel fees. Against that baseline, synthetic users at \"$0 marginal cost and 5 minutes\" felt revolutionary.\n\nThe baseline has moved.\n\n## How Koji Helps: Real Interviews at Synthetic-Speed Economics\n\nKoji is built around a simple thesis: **the original cost-speed argument for synthetic users assumed human-moderated research was the only way to get real respondent data. That assumption is no longer true.**\n\nA Koji study runs like this:\n\n- **Recruit panel or use your own list.** Same as before.\n- **Configure questions.** Mix [structured questions](/docs/structured-questions-guide) (scale, ranking, single/multiple choice, yes/no) with open-ended prompts that an [AI moderator](/docs/ai-moderated-interviews) probes in real time.\n- **Launch.** Participants take the interview when they're ready, in voice or text mode ([Voice vs Text](/docs/voice-vs-text-interviews)). The AI moderator runs 24/7 — no scheduling.\n- **Get a thematic report.** Automatic [thematic analysis](/docs/research-synthesis-guide) groups responses into themes within hours of the last interview completing.\n\nThe result: 30+ real respondent interviews delivered in 24-72 hours at a fraction of legacy moderated-research cost ([User Research Cost Calculator](/docs/user-research-cost-calculator-2026)). The \"synthetic users are faster\" argument breaks. **Koji gives you real respondent data on a synthetic-speed timeline.**\n\nThis matters because the cost-speed argument was the *only* serious argument for synthetic users on decision-grade research. With that argument removed, the validity gap is no longer worth the convenience.\n\n## A 2026 Hybrid Workflow\n\nThe most sophisticated research programs we see don't treat this as binary. They use both — but for different purposes:\n\n1. **Day 1 — Synthetic exploration.** Spin up 3-4 synthetic personas. Generate 50 candidate questions. Cluster into 5 hypotheses. **Output: a discussion guide and a feature ranking list.**\n2. **Day 2 — Real validation.** Launch a Koji study with the refined guide. Recruit 30-50 real respondents. AI-moderated interviews capture voice and text data 24/7.\n3. **Day 3-4 — Real synthesis.** Koji's thematic analysis groups themes. Compare against the synthetic hypotheses from Day 1. Most will be partially confirmed; many will be flipped (this is the point).\n4. **Day 5 — Decisions.** Make product decisions on the real data. The synthetic exploration shaped the questions; the real research provides the answers.\n\nIn this workflow, synthetic users earn their keep — they sharpen the questions. Real users earn theirs — they provide the evidence. Nothing critical hangs on a synthetic response.\n\n> \"AI can accelerate workflows and spark hypotheses, but it remains a poor substitute for human input, and synthetic insights should be treated as hypotheses, not validated truth.\" — common conclusion across 2026 academic reviews\n\n## What to Watch For\n\nIf a vendor or colleague describes synthetic-user research, ask:\n\n- **What is being measured against ground truth?** If nothing is, the answers can't be wrong, which means they can't be right either.\n- **What's the tolerance for sign-flipping?** A 32% chance the direction of an effect is wrong is not acceptable for product decisions.\n- **Does the deliverable claim \"users said\" or \"personas would say\"?** The first is misrepresentation; the second is honest.\n- **What's the plan to validate against real users?** No plan = no research, just generated content.\n\n## Modern Approach with AI: Two Different Things\n\nThere are two very different \"AI in research\" stories in 2026, and they get conflated:\n\n- **Story A: AI replaces participants** (synthetic users). Documented validity problems. Use only as adjunct.\n- **Story B: AI replaces the moderator and analyst** (Koji). Real participants, AI-moderated and AI-analyzed. Validity preserved; cost and speed match synthetic.\n\nThese are not the same thing. Story A asks a model to predict what humans would say. Story B asks humans what they say, while the model handles the operational overhead. The first introduces a fundamental validity question. The second doesn't.\n\nIf you find yourself reaching for synthetic users for cost reasons, the modern answer is to skip them and run real AI-moderated research instead.\n\n## Related Resources\n\n- [AI-Moderated Interviews: How Automated Research Works](/docs/ai-moderated-interviews)\n- [Voice vs Text Interview: When to Use Each Mode](/docs/voice-vs-text-interviews)\n- [Structured Questions Guide: 6 Question Types in Koji](/docs/structured-questions-guide)\n- [User Research for AI Products: A Practical Guide](/docs/user-research-for-ai-products)\n- [User Research Cost Calculator: AI Interviews vs Traditional](/docs/user-research-cost-calculator-2026)\n- [AI-Generated Customer Personas: From Real Interview Data to Persona](/docs/ai-generated-customer-personas)\n- [Research Bias Guide](/docs/research-bias-guide)","category":"Research Methods","lastModified":"2026-05-04T03:23:07.187155+00:00","metaTitle":"Synthetic Users in Research: Validity, Bias, and When to Use AI Personas (2026)","metaDescription":"A methodology guide to synthetic users — the documented bias and validity problems (NN/G, ACM, sign-flipping research), legitimate use cases, and why real AI-moderated interviews now match synthetic-speed economics.","keywords":["synthetic users","synthetic personas","ai user research","synthetic respondents","ai personas research","synthetic data research","ai generated user research","llm research validity","synthetic vs real users","user research methodology"],"aiSummary":"Methodology guide to synthetic users (AI-generated research participants). Documents the validity problems — NN/G's shallow-insights critique, ACM Interactions' synthetic persona fallacy, 48% coefficient mismatch and 32% sign-flipping in quantitative comparisons, and the sycophancy bias driven by RLHF training. Outlines legitimate adjunct uses (vocabulary familiarization, hypothesis generation, pre-interview rehearsal) and explains how AI-moderated interviews on real participants now match synthetic-speed economics, removing the original cost-speed argument for synthetic-as-substitute.","aiPrerequisites":["Familiarity with user research methods","Basic understanding of LLMs/generative AI","A research question or study you are scoping"],"aiLearningOutcomes":["Distinguish synthetic users from AI-moderated real interviews","Recognize the documented validity problems with synthetic respondents","Identify legitimate adjunct use cases for synthetic users","Apply a decision framework for when synthetic is and is not acceptable","Build a hybrid workflow that uses synthetic for hypothesis generation and real for validation"],"aiDifficulty":"intermediate","aiEstimatedTime":"14 min read"}],"pagination":{"total":1,"returned":1,"offset":0}}