What is Product similarity edges?

Five different ways Theia connects products into a competitive graph: use-case similarity, feature similarity, benefit similarity, keyword similarity, and co-occurrence. Each captures a different competitive signal.

Product similarity edges

What it is

For every pair of products in a category, Theia computes up to five competitive proximity scores:

Edge type	What it captures	Best for
Use-case similarity	Cosine on use-case mention vectors	Broad categories where use cases differentiate (vlogging vs cinema)
Feature similarity	Cosine on feature mention vectors	Technical capability overlap
Benefit similarity	Cosine on benefit mention vectors	Perceived advantage overlap
Keyword similarity	Cosine on keyword traffic vectors	Search-intent overlap
Co-occurrence	Joint mention in same document	Narrow categories where consumers explicitly compare

All five are stored in a unified product_edges table. None is the "right" answer alone — each captures a different competitive signal.

Why five edges, not one

A single "competitive proximity" number hides the texture of competition.

Use-case similarity is the strongest differentiator in broad categories. For mid-range mirrorless, vlogging-oriented cameras (Sony ZV-E1) cleanly separate from cinema-oriented bodies (Canon C50, Sony FX3) with cosine 0.2–0.8.

But use-case similarity saturates in narrow categories. For pro photo printers, all pairs score > 0.90 because they all "print photos at quality". Use case isn't the differentiator there.

Co-occurrence is most informative in narrow categories where consumers and reviewers explicitly compare specific pairs. Canon PRO-200 ↔ Canon PRO-310: 10 docs co-mention. Epson P700 ↔ P900: 9 docs. These are the actual decision pairs.

Keyword similarity captures search intent overlap — useful for identifying products competing for the same traffic, complementary to perception-based edges.

How it's calculated

All edges use cosine similarity on raw counts — not normalised shares. This makes them set-independent: the edge between products A and B does not change when product C is added or removed.

Validated empirically: A-B cosine is identical with {A, B} and {A, B, C} product sets.

For each product, Theia builds a sparse vector indexed by canonical property (or keyword). The cosine between two products' vectors becomes the edge score.

How strategy uses it

L3 Situation Analysis uses the edges to identify the competitive set for each product:

Pull all edges where the focal product appears
Rank by blended score (weights vary by category breadth)
Top 3-5 become the head-to-head competitors for the deck

For Canon EOS R8, the top competitors by blended score are: Nikon Z5 II, Sony A7C II, Sony A7 III, Fujifilm X-T5, Canon EOS R6 II. Different from "what the brand thinks the competition is" — driven by what consumers actually compare.

Why this matters

Brand teams routinely identify the wrong competitors. They benchmark against the brand they used to compete with, or the brand their sales force complains about — neither of which is what consumers compare them against.

Product edges put the actual decision pair on the table, with evidence. That's a different conversation.

Related terms