Glossary
The grammar of structured market intelligence.
Every term, every framework, every method Theia uses — defined. So your team has a shared vocabulary, and AI engines have a citable source.
methodology
How Theia maps raw extracted labels from any source language to canonical properties. 'Battery life' / 'akkulaufzeit' / 'autonomie batterie' / 'autonomia' all resolve to BATTERY_LIFE — but extraction happens in the source language first.
The position-to-click-through-rate mapping that powers visibility and share-of-voice calculations. Position 1 on Google captures ~28% of clicks, position 10 captures ~2%. Amazon and AI Overview surfaces have their own curves.
The set of guards, run trackers and budget assertions that wrap every Theia pipeline step. Without it, AI-driven research becomes unreproducible, unbounded in cost, and untrustworthy at the deck stage. With it, the same input always produces the same output, on the same budget.
A specific prompt-engineering finding from the Canon B2B deployment. Adding an explicit citation instruction to the German LLM scraping prompt lifted source capture from 0.69 to 4.06 sources per item — a 6× increase. Saved 8+ months of 'DE is blind' reporting.
How Theia ranks which keywords define which market segments. Borrows the Herfindahl-Hirschman concentration index from antitrust economics and applies it to keyword × cluster traffic distributions.
Grouping individual search queries into demand pockets that share intent. 'Best mirrorless camera 2025', 'top mirrorless for beginners', and 'spiegellose kamera test' are different keywords but the same demand pocket.
The clustering algorithm Theia uses to find market segments in keyword × product graphs. Successor to Louvain, with mathematical guarantees Louvain lacks. Run with Surprise optimisation for well-separated communities.
Theia's core engineering doctrine. Use LLMs where language understanding is genuinely required — extraction, sentiment, multilingual harmonisation. Use deterministic math — cosine, Leiden, HHI, TF-IDF — for every graph connection. This is what makes the intelligence layer reproducible at scale.
Five different ways Theia connects products into a competitive graph: use-case similarity, feature similarity, benefit similarity, keyword similarity, and co-occurrence. Each captures a different competitive signal.
The property that the same inputs always produce the same outputs, that every output is source-traceable, and that every run is replayable. Reproducibility is the precondition for AI research that a board, a regulator, or a sceptical analyst can sign off on.
How feature-level sentiment changes over time, computed per product × property × period. The single most actionable perception metric — because trajectory matters more than level.
concept
The proportion of LLM-synthesised answers (Google AI Overviews, ChatGPT, Perplexity, Claude) that cite your brand as a source. The newest competitive surface — and the one most brands have never measured.
The percentage of buyers in one category who also buy in a complementary category. A single number that reveals whether a market is structurally underdeveloped or saturated.
The shift from quarterly research waves to weekly refresh of the same intelligence layer. The single biggest change in how brands measure markets in the 2020s — and the one most research firms haven't adapted to yet.
The 8,000+ specialist sources — engineer forums, standards bodies, niche subreddits, trade press, OSS repositories — that drive specification decisions in B2B and industrial markets. Where standard social listening doesn't go.
The observation that consumer markets, when read from search data, consistently emerge as 8-15 segments — not the 3-5 of legacy research methodology. The Bose Germany headphone market resolved into exactly 11.
The principle that the market ontology should be defined by domain experts, not discovered by an LLM. Canonical properties, segments, and product taxonomies are curated. LLMs handle extraction, math handles connections.
The four measurement layers every consumer brand needs to track continuously: Demand (what the market wants), Visibility (where you show up), Sales (how it converts), and Perception (what the market thinks).
The split between search traffic from category keywords ('noise cancelling headphones') and brand keywords ('bose qc45'). Premium brands routinely convert generic traffic 10× better than mid-market — but capture 13× less of it.
The foundational observation behind Theia: search engines have already segmented every market via the continuous, planetary-scale matching of queries to results. Reading that segmentation beats inventing your own.
The proportion of organic search clicks a brand captures in a category. Computed properly via CTR-weighted ranking position — not mention counts, not raw impressions, not unweighted SERP appearances.
An always-on, queryable repository that maps every product, every customer voice, every sales number, and every search behaviour in a market — as one connected graph, refreshed continuously.
tool
Theia's Model Context Protocol server exposes the intelligence repository to any LLM client (Claude Desktop, Cursor, custom agents). Ask 'which 5 vendors lead GenICam compatibility in EU machine-vision forums' — get a sourced answer in seconds.
The primary storage unit of Theia's intelligence layer. Every extracted feature, benefit, use-case, comparison, and sentiment from every source ends up as a snippet — atomically queryable, source-traceable, harmonised.
strategy
The four-quadrant comparison of what a brand says about its product vs what the market says. Validates, oversells, ignores, or untapped — each classification leads to a different action.
The first agent in the L1→L4 strategy chain. Reads the full intelligence repository for a category and produces a stable, structured brief: pain points, growth levers, audience segments, defining properties, top use cases.
Theia's four-stage strategy generation pipeline. L1 frames the category, L2 measures perception, L3 turns measurement into priorities, L4 ships content. Every agent reads from pre-computed tables — never recomputes from raw snippets.
The three-role architecture Theia uses to ship AI-generated strategy that a board will sign off on. Writer agents draft, Reviewer agents check evidence and consistency, the Senior Analyst (human) sets scope and signs the deck. Most AI research tools ship the Writer alone — and look like demos as a result.