Synthetic Users in Research: Validity, Bias, and When AI Personas Are (and Aren't) Trustworthy

TL;DR (Answer-First): Synthetic users — AI-generated personas that respond to research prompts as if they were real participants — are useful for hypothesis generation, vocabulary familiarization, and pre-interview rehearsal. They are not valid substitutes for real respondents in concept testing, prioritization, persona creation, or any decision that depends on observed behavior. Documented bias problems include systematic sycophancy, sign-flipping on causal effects, and a tendency to enthusiastically endorse every concept. The 2026 best practice is to use synthetic users for early-stage exploration and route every actual decision through real participant data — which AI-moderated interview platforms like Koji now make possible at synthetic-speed economics.

What Synthetic Users Are

A synthetic user is a large language model (LLM) prompted to behave as if it were a real research participant — answering interview questions, reacting to concepts, completing surveys. Instead of recruiting humans, you instantiate a "persona" (e.g., "a 34-year-old urban parent shopping for car insurance") and let the model produce responses.

The pitch is compelling: hundreds of "interviews" in minutes, no recruiting, no incentives, no scheduling, no transcription. Several startups (Synthetic Users, Userdoc, others) sell this as a research replacement.

The problem is that the responses look like research without being it.

What the Evidence Says (And Why Researchers Are Worried)

NN/G: "Too Shallow to Be Useful"

Nielsen Norman Group ran direct comparisons between synthetic users and real participants on the same studies. Their conclusion was unambiguous: "Synthetic-user responses for many research activities are too shallow to be useful" (NN/G, "Synthetic Users: If, When, and How to Use AI-Generated Research").

Key observations from NN/G's comparative work:

Sycophancy at scale. Synthetic users praised every concept. Real users balanced interest with concerns and questioned feasibility. For concept testing — where the entire point is finding which ideas users reject — this makes synthetic responses worse than useless; they generate confident false positives.
Behavioral mismatch. In one study, synthetic users enthusiastically described participating in user forums, while real users mostly avoided forums and called them "contrived." The AI was predicting idealized behavior from training data, not actual behavior.
Uniform care. Synthetic users seemed to care about everything roughly equally — fatal for feature prioritization, where the point is to find what matters most.

ACM Interactions: The Synthetic Persona Fallacy

The January-February 2026 issue of ACM Interactions published "The Synthetic Persona Fallacy: How AI-Generated Research Undermines UX Research" (ACM Interactions), arguing that synthetic personas erode the discipline's evidentiary base. The core claim: a synthetic user cannot be falsified. Real users surprise you. Synthetic users return your assumptions in dialogue form.

The Sign-Flipping Problem

The most damaging empirical finding comes from quantitative comparisons. In a study comparing ChatGPT-generated survey responses to real human survey data, 48% of estimated coefficients were statistically significantly different from their human counterparts. Of those, the sign of the effect flipped 32% of the time (referenced via Radical Product Thinking, "Synthetic Users vs Real User Research").

In plain language: roughly one in three "findings" from synthetic data didn't just have the wrong magnitude — they pointed in the wrong direction entirely. Building a roadmap on that data is worse than building a roadmap on no data, because you have false confidence in claims that are actively misleading.

B2B Sycophancy

A comparative study by Emporia Research between three groups of B2B survey respondents found that synthetic users showed strong positive bias and "herd mentality" compared to real B2B respondents. The synthetic group agreed with each other more than real respondents do, and rated everything more favorably. For B2B research — where the gating problem is usually distinguishing real intent from polite agreement — synthetic users amplify the exact bias you're trying to avoid.

Why Synthetic Users Fail in This Specific Way

The failure modes aren't random. They flow from how LLMs are trained.

RLHF tunes models toward agreement. Reinforcement learning from human feedback rewards responses humans rate highly. Humans tend to rate friendly, supportive, agreeable responses higher. The model learns to please.
Training data is text, not behavior. Models are trained on what people say online — not on what they do in product. They learn idealized behavior from blogs, reviews, and forums, which is systematically different from observed behavior.
No uncertainty or contradiction. Real respondents say "I don't know," contradict themselves between minute 5 and minute 25, change their minds, get confused, push back on the question. Models smooth all of this into coherent narrative.
No surprise. Synthetic users can only recombine patterns from training data. They cannot tell you something the dataset didn't already contain. Real research's highest value is the unknown unknown — and synthetic users systematically cannot deliver it.

When Synthetic Users Are Legitimate

The critique above is about using synthetic users as a replacement for real research. Used as an adjunct, they have real value:

Vocabulary familiarization. Before interviewing dental practice managers, prompt a synthetic dental practice manager and absorb the jargon. You walk into real interviews sounding like you belong there.
Hypothesis generation. Generate 20 candidate concerns the persona might raise. Treat the list as hypotheses to test, not findings.
Pre-interview rehearsal. Run your discussion guide through a synthetic respondent. Spot questions that go nowhere. Refine before you spend a real participant's time.
Stress-testing materials. Have synthetic users react to a survey draft. Use the reactions to spot ambiguous wording — but don't use the response distribution as data.
Long-tail edge personas you can't recruit. If your real population is genuinely impossible to reach (e.g., very rare medical condition), synthetic exploration can scaffold an early hypothesis — provided you label it that way.

The pattern: synthetic users generate questions; real users generate answers. Inverting that pattern is where teams get into trouble.

The Honest Decision Tree

Use this when deciding whether a research question can be answered synthetically:

Are you making a product decision based on the result? → Real users. Period.
Are you prioritizing features? → Real users. Synthetic sycophancy ruins prioritization.
Are you testing a concept's appeal? → Real users. Synthetic positivity distorts every concept upward.
Are you building a persona that downstream teams will treat as a real customer? → Real users.
Are you exploring vocabulary, generating hypotheses, or rehearsing materials? → Synthetic is fine, labeled as such.
Are you doing PR or marketing the research as "user research"? → Real users — anything else is misrepresentation.

The Argument That Synthetic Users Used to Win — and No Longer Does

The original case for synthetic users rested on cost and speed. Real research is expensive and slow; synthetic research is cheap and fast. Therefore, the argument went, synthetic was a defensible compromise for early-stage work.

That argument was strongest in 2023 when running 30 real moderated interviews took 4-6 weeks and cost $5-15K in incentives and panel fees. Against that baseline, synthetic users at "$0 marginal cost and 5 minutes" felt revolutionary.

The baseline has moved.

How Koji Helps: Real Interviews at Synthetic-Speed Economics

Koji is built around a simple thesis: the original cost-speed argument for synthetic users assumed human-moderated research was the only way to get real respondent data. That assumption is no longer true.

A Koji study runs like this:

Recruit panel or use your own list. Same as before.
Configure questions. Mix structured questions (scale, ranking, single/multiple choice, yes/no) with open-ended prompts that an AI moderator probes in real time.
Launch. Participants take the interview when they're ready, in voice or text mode (Voice vs Text). The AI moderator runs 24/7 — no scheduling.
Get a thematic report. Automatic thematic analysis groups responses into themes within hours of the last interview completing.

The result: 30+ real respondent interviews delivered in 24-72 hours at a fraction of legacy moderated-research cost (User Research Cost Calculator). The "synthetic users are faster" argument breaks. Koji gives you real respondent data on a synthetic-speed timeline.

This matters because the cost-speed argument was the only serious argument for synthetic users on decision-grade research. With that argument removed, the validity gap is no longer worth the convenience.

A 2026 Hybrid Workflow

The most sophisticated research programs we see don't treat this as binary. They use both — but for different purposes:

Day 1 — Synthetic exploration. Spin up 3-4 synthetic personas. Generate 50 candidate questions. Cluster into 5 hypotheses. Output: a discussion guide and a feature ranking list.
Day 2 — Real validation. Launch a Koji study with the refined guide. Recruit 30-50 real respondents. AI-moderated interviews capture voice and text data 24/7.
Day 3-4 — Real synthesis. Koji's thematic analysis groups themes. Compare against the synthetic hypotheses from Day 1. Most will be partially confirmed; many will be flipped (this is the point).
Day 5 — Decisions. Make product decisions on the real data. The synthetic exploration shaped the questions; the real research provides the answers.

In this workflow, synthetic users earn their keep — they sharpen the questions. Real users earn theirs — they provide the evidence. Nothing critical hangs on a synthetic response.

"AI can accelerate workflows and spark hypotheses, but it remains a poor substitute for human input, and synthetic insights should be treated as hypotheses, not validated truth." — common conclusion across 2026 academic reviews

What to Watch For

If a vendor or colleague describes synthetic-user research, ask:

What is being measured against ground truth? If nothing is, the answers can't be wrong, which means they can't be right either.
What's the tolerance for sign-flipping? A 32% chance the direction of an effect is wrong is not acceptable for product decisions.
Does the deliverable claim "users said" or "personas would say"? The first is misrepresentation; the second is honest.
What's the plan to validate against real users? No plan = no research, just generated content.

Modern Approach with AI: Two Different Things

There are two very different "AI in research" stories in 2026, and they get conflated:

Story A: AI replaces participants (synthetic users). Documented validity problems. Use only as adjunct.
Story B: AI replaces the moderator and analyst (Koji). Real participants, AI-moderated and AI-analyzed. Validity preserved; cost and speed match synthetic.

These are not the same thing. Story A asks a model to predict what humans would say. Story B asks humans what they say, while the model handles the operational overhead. The first introduces a fundamental validity question. The second doesn't.

If you find yourself reaching for synthetic users for cost reasons, the modern answer is to skip them and run real AI-moderated research instead.

Product & Research

Revenue & Growth

Advisory & Services

Synthetic Users in Research: Validity, Bias, and When AI Personas Are (and Aren't) Trustworthy

Synthetic Users in Research: Validity, Bias, and When AI Personas Are (and Aren't) Trustworthy

What Synthetic Users Are

What the Evidence Says (And Why Researchers Are Worried)

NN/G: "Too Shallow to Be Useful"

ACM Interactions: The Synthetic Persona Fallacy

The Sign-Flipping Problem

B2B Sycophancy

Why Synthetic Users Fail in This Specific Way

When Synthetic Users Are Legitimate

The Honest Decision Tree

The Argument That Synthetic Users Used to Win — and No Longer Does

How Koji Helps: Real Interviews at Synthetic-Speed Economics

A 2026 Hybrid Workflow

What to Watch For

Modern Approach with AI: Two Different Things

Related Resources

Related Articles

AI-Generated Customer Personas: From Real Interview Data to Persona

User Research Cost Calculator: AI Interviews vs Traditional (2026)

AI-Moderated Interviews: How Automated Research Works (And Why It Works Better)

Voice vs Text Interview: When to Use Each Mode

Structured Questions in AI Interviews

Research Bias: The Complete Guide to Cognitive Biases That Corrupt User Research

The Complete Guide to AI-Powered Qualitative Research

User Research for AI Products: A Practical Guide for 2026