The platform
One continuous engine. Four pillars. Five stages. Ship-ready strategy at the end.
Theia is a single integrated intelligence engine — not a dashboard, not a research wave. Every piece runs continuously. Every output is queryable. Every claim is source-traceable.
The 5-stage pipeline.
Each stage has a single responsibility and a stable output schema. Downstream stages never re-compute upstream work.
01
Collect
DataForSEO for SERP, AI Overview citations, Labs keyword expansion. Oxylabs for deep-scrape of reviews, YouTube transcripts, web articles, retailer pages. JungleScout / Stackline for marketplace sales. GfK and Canon 1P feeds where licensed. 8,000+ classified deep-web sources for B2B/industrial.
02
Enrich
Claude Haiku extracts features, benefits, use cases, comparisons and sentiment from every source — in the source language. Budget-guarded ($50 cap per batch). Distinctive-keyword scoping prevents runaway cost.
03
Structure
Bipartite keyword × product graphs. Leiden Surprise clustering finds market segments. HHI-weighted distinctiveness names them. Cross-language harmonisation maps raw labels to canonical properties. The result: a stable, queryable graph.
04
Strategise
Four agents in sequence: L1 Category Brief (what the market values), L2 Perception Report (how products perform), L3 Situation Analysis (what to do), L4 Content Generation (ship-ready listings, PDPs, briefs).
05
Converse
MCP-exposed intelligence. Connect Claude Desktop, Cursor or a custom agent. Ask natural-language questions against the full repository. Sourced answers in seconds.
The four pillars.
Mapped to the customer journey: Demand → Visibility → Sales → Perception (and back). Most platforms cover one. Theia connects all four into the same graph.
Demand
Search impressions & estimated volume
Google (GSC, Trends, Ad Planner), Amazon search volume, distinctive keywords per segment
Visibility
Click-weighted share of voice
Google rankings, Amazon rankings, AI Overview citation share, CTR-curve weighted
Sales
Units, revenue, share — daily/weekly
1P: Vendor Central + Canon internal. 3P: Stackline (Amazon), GfK (all channels)
Perception
Feature sentiment & trajectory
Reviews, YouTube, Reddit, web articles, BazaarVoice, AI Overviews — multi-source, multi-language
From data to deck: the L1-L4 strategy chain.
The strategy agent chain is the delivery surface. Each agent reads from pre-computed structured tables — never from raw text — which makes them fast, cheap, and reproducible.
Category Brief
What the market values: pain points, growth levers, audience segments, defining properties.
Perception Report
How your products perform: feature sentiment, trajectories, competitive leaderboard.
Situation Analysis
What to do: priorities per product, brand vs market gaps, recommended actions with evidence.
Content Generation
Ship: Amazon listings, brand PDPs, retailer-specific PDPs, content briefs, marketing copy.
Read the deep-dive: strategy agent chain.
Conversational, by design.
Theia exposes the intelligence repository through MCP (Model Context Protocol). Connect Claude Desktop, Cursor, or your own agent. Ask in natural language. Get sourced answers in seconds.
// example query via MCP
"Which 5 vendors are most cited
for GigE Vision compliance
in EU machine-vision forums
over the last 6 months?"
→ sourced answer in 4.2s
Eight principles behind the engine.
The opinions baked into the architecture. Each one is a choice that distinguishes Theia from another dashboard or another research wave.
01
Search engines have already segmented your market.
Continuous, planetary-scale query-to-result matching IS market segmentation. Reading it beats inventing your own.
02
Four pillars, never three.
Demand, Visibility, Sales, Perception. Any platform claiming to do market intelligence with fewer is selling you one slice.
03
Continuous beats quarterly.
Weekly refresh is the new floor. Quarterly waves discard 90% of the signal that actually moves brands.
04
Native language, then harmonise.
German extraction stays in German until the canonical mapping step. Translation-first pipelines lose 80% of the signal.
05
LLMs for extraction. Math for connections.
Features and sentiment need language understanding. Graph edges need cosine similarity and Leiden clustering — not LLM judgement.
06
Fixed entities, not LLM-discovered ontology.
Schema is curated by domain experts. LLMs do the extraction the LLMs are good at. This is what makes the graph stable.
07
Trajectory matters more than level.
Sentiment 0.69 is meaningless. Sentiment improving from 0.41 to 0.69 over 12 months is a strategy.
08
Strategy ships from the same engine as the data.
L1-L4 agents read the structured intelligence layer. The deck and the SQL come from the same place.
See the engine on your category.
A 30-minute walkthrough on your market with Pascal. Or a one-pack pilot that you scale from there.