The finding
In the early Canon B2B deployment, German source extraction was returning 0.69 sources per item vs ~2.4 in English. Same pipeline, same models, same source registry. The German data was structurally weaker.
Initial hypothesis: German engineering forums are smaller and harder to crawl. Plausible but wrong.
Real cause: the LLM was producing answers in German but failing to cite the sources it had used. The English prompt asked "what do experts say about X, with citations" and got citations. The naïve German translation of that same prompt got synthesised summaries without source URLs.
The fix
Adding an explicit, separate sentence to the German prompt:
"Liste am Ende ALLE URLs auf, die du verwendet hast. Eine URL pro Zeile. Keine Zusammenfassungen, nur die URLs."
("List ALL URLs you used at the end. One URL per line. No summaries, just the URLs.")
This lifted German citation rate from 0.69 → 4.06 sources per item. A 6× lift, with no other change.
Why this happens
A few hypotheses, none individually proven but all plausible:
01 — Training data imbalance. LLM training data is English-heavy. The instruction-following pattern "include citations" is strongly trained in English contexts. The same instruction in German is softer, more advisory, and the model treats it as optional.
02 — Citation cultural norms. English-language technical writing has stronger inline-citation conventions than German technical writing, which often defers citations to a footnote or bibliography. The LLM may be reflecting this cultural pattern.
03 — Translation ambiguity. The English phrase "with citations" is unambiguous. The German equivalent has multiple translations ("mit Quellen", "mit Belegen", "mit Zitaten") with different connotations — some imply academic citation, others imply informal source mention. The LLM may have collapsed this ambiguity in unpredictable ways.
What this generalised to
After the German fix, we tested Japanese, Korean, French, and Italian with the same explicit-citation-forcing addition. All four showed lifts (Japanese 4×, Korean 3×, French and Italian smaller 1.5-2× lifts since baseline was already higher).
The lesson is general: when running LLM extraction in non-English markets, never assume the English prompt's instruction-following will translate. Test citation rate explicitly, and add language-specific forcing instructions where needed.
Why this matters strategically
For a research firm pitching multilingual coverage, this is the kind of trade craft that determines whether the deliverable is real or theatre.
A firm running "multi-market intelligence" with a translated English prompt and no citation-rate validation is shipping the German output unaware that 80% of the signal is missing. Their German answers will sound fluent and be wrong.
Theia ran the validation, found the gap, and fixed it. The 6× lift is documented and reproducible.
Operational implication
The Canon B2B pipeline now includes a citation-rate monitor per language. If German drops below 3.0 sources/item, the pipeline alerts and refuses to publish until investigated. This is the kind of post-deployment quality control that production market intelligence demands.