The honest one-paragraph version
Simile, Quantilope, Listenlabs, Consumr.ai and other synthetic-respondent platforms generate AI personas that stand in for real consumers in surveys, concept tests, and qualitative research. The category raised significant capital in 2025-2026 (Simile alone: $100M in February 2026). For early-stage hypothesis testing and well-anchored attitudes, synthetic respondents can be useful, cheap, and fast.
Theia is the structured-intelligence alternative — revealed preference from the continuous open-web signal, native-language extraction, integrated across four pillars, source-traceable. We don't generate synthetic personas. We read what real consumers are actually doing.
The two approaches answer different questions for different decisions.
Where synthetic respondents are useful
Three jobs where synthetic respondents have a defensible place:
- Early-stage hypothesis exploration — when you don't yet know which questions matter
- Well-anchored attitudes — established product categories where consumer preferences are stable and well-documented in training data
- Scenario simulation — counterfactual exploration ("what if pricing moved to $X?") that no real-data approach can answer
If your decision falls into one of these three buckets, a synthetic-respondent platform may be the right tool.
Where synthetic respondents structurally fail
Five failure modes that the methodology critique discourse has surfaced (NIQ, Bellomy, VerianGroup, Perspective AI):
01 — Derivative intelligence
A synthetic respondent is as good as the training data behind it. New products, cultural shifts, B2B niches, niche regional preferences — anything outside the training distribution gets fabricated to look plausible. The model doesn't know what it doesn't know.
02 — Sycophancy and mode collapse
RLHF-trained models are structurally biased toward agreement with the user. Multi-turn synthetic interviews drift toward whatever the researcher seems to want. Mode collapse on minority preferences is a property of the architecture, not a bug.
03 — Western-context bias
Willingness to pay, pricing preferences, brand loyalty and category attitudes show strong cultural patterns that synthetic respondents fail to simulate outside Western training corpora. For multi-market consumer brands and B2B SaaS selling globally, this is a fatal limitation.
04 — Recursive model collapse
As synthetic-respondent outputs increasingly enter training data, the bias compounds. Synthetic respondents in 2028 will be partly trained on synthetic respondents from 2026. The signal collapses over time without anyone noticing.
05 — Empirical failure on validated tests
Where synthetic respondents have been validated against known outcomes, results have been weak. One published 2025 example: predicted 83% electoral participation against actual 49%. The errors are not symmetric or correctable — they are downstream of the architecture.
Where Theia is built differently
| Dimension | Synthetic respondents | Theia |
|---|---|---|
| Signal type | Simulated preference | Revealed preference |
| Source | LLM-generated personas | Real reviews, transcripts, articles, search behaviour, AI Overview citations |
| Coverage | Survey-scale questions | Open-ended category and competitive intelligence |
| Cross-language | Translated to English typically | Native-language extraction + harmonisation |
| Bias model | Inherited from training data | Auditable per source |
| Model collapse risk | High and compounding | None (real data, math for connections) |
| Reproducibility | Stochastic per run | Run-id reproducible |
| Source traceability | Synthetic — not traceable | Every claim links to real source URL |
| B2B / industrial coverage | Weak (training data sparse) | 8,000+ deep-web sources |
| EU AI Act readiness | Unclear (synthetic data governance debated) | Reproducible, source-cited, ready |
Where the genuine overlap is
Both produce consumer insight. Both use AI. Both promise to be faster and cheaper than traditional market research.
The substantive difference:
- Synthetic respondents simulate what consumers would say
- Theia measures what consumers actually do (search, buy, review, cite, watch)
For a CMO making a brand reposition decision, the second is structurally more defensible. For a PE operating partner making a portfolio bet, the second is the only one a credible IC will accept. For a regulated services brand making a customer-facing claim, the second is the only one the compliance team will sign off on.
Should you have both?
Yes, occasionally:
- Synthetic for the early "what should we even be testing?" hypothesis exploration
- Theia for the structured-intelligence layer that monitors what's actually happening
For most consumer brands above mid-market scale, Theia alone covers more ground than synthetic respondents alone. For PE, regulated services, and B2B, synthetic respondents have no place in board-grade decisions.
Pricing comparison
- Simile and synthetic-respondent platforms — typically $5-30k/month depending on volume
- Theia Tier 3 (consumer brands) — £6k/month for 4 countries, all four pillars, L1-L4 strategy chain
Comparable price tier, different value. The right question isn't "which is cheaper?" but "which gives me a defensible answer?"
What we'd want a fair comparison to include
If you're evaluating Theia and a synthetic-respondent platform:
- Take one real decision from the last 12 months. Run both platforms on it. Compare answers against actual outcome.
- Take one query where you have known sales data. Ask each platform what segment is growing. Validate against the sales data.
- Take one B2B category your synthetic respondents don't know well. See what each platform produces.
We're happy to participate. The honest test isn't a feature comparison — it's whether the platform's output corresponds to reality.