Research

Methodology

Last updated: 19 May 2026

This page describes how Cited Brands measures AI search citations for the brands featured across the site. We publish the methodology so any reader can replicate the work, dispute a number, or evaluate the rigor before relying on a finding.

SA-AEO-Bench v1 was pre-registered on the Open Science Framework at osf.io/w4az2 before any LLM was queried. Hypotheses H1–H7, the prompt set, the analysis code, the scoring rubric, and the budget ceiling were all locked in at submission time — eliminating the p-hacking degree of freedom that defeats most marketing-vendor “research”. Five public documents (form fields · protocol · interim brief · pre-completion report · insights deliverable) ship under /research.

Question set

Each brand is queried with a structured set of natural-language questions across five categories: organic awareness ("what is {brand}"), brand authority ("is {brand} legitimate / safe / regulated"), competitive comparison ("{brand} vs {competitor}"), complaint discovery ("why are people unhappy with {brand}"), and editorial framing ("recent news about {brand}"). The May 2026 pilot used 115 questions across 20 brands; the production Index Report uses 100 per brand × 100 brands × 5 LLMs ≈ 50,000 query-engine pairs per quarter.

LLM coverage

The full Index covers five engines:

  • OpenAI GPT-5 (chat + web-search tool)
  • Anthropic Claude Sonnet 4.5 (with web tool)
  • Google Gemini 2.5 Pro (with grounding)
  • Perplexity Sonar Large
  • Google AI Overviews (SERP capture via headless browser)

Each engine is queried independently. Citations are extracted from the model's response, deduplicated by canonical URL, and attributed to a source domain.

Latin Square debiasing on comparison queries

Comparison queries ("{A} vs {B}") show measurable position bias — the order of the brand names in the prompt changes which sources the model retrieves. We mitigate this by running every comparison in both orders ("A vs B" and "B vs A"), averaging the citation sets, and reporting the Jaccard overlap between the two runs as a methodology check. May 2026 pilot Jaccard values: GPT-5 0.33, Claude 0.54, Gemini 0.23. We do not publish single-direction comparison data.

Citation parsing accuracy

URLs in model responses are extracted by structured-output parsing where available and by regex extraction with TLD-aware deduplication elsewhere. The pilot's unparseable rate was 12 of 6,204 citations (0.2%); our production target is <1%.

Source classification

Each cited domain is classified into one of five categories:

  • own: the brand's own primary domain or registered subdomains
  • complaint: consumer-complaint aggregators (PissedConsumer, HelloPeter, Trustpilot, Complaintsboard) and brand-specific complaint subdomains
  • editorial: independent SA media (BusinessTech, MyBroadband, Daily Maverick, IOL, MoneyWeb), international media that cover SA business, and named industry analysts
  • review: general-purpose review aggregators (G2, Capterra) and industry-specific review surfaces
  • competitor: domains of named competitors to the focus brand

The classifier is rule-based with manual overrides for edge cases. The full classifier rules + overrides ship as JSON in the open-source packages/aeo-score repository (link forthcoming when published).

Net narrative control

Our headline metric. Defined as (own_pct − complaint_pct) — the share of citations from the brand's own domain minus the share from complaint sources. Range −100 to +100. A descriptive observation of the citation pattern, not a judgment of the brand itself.

Reproducibility + provenance

Every dataset release is committed to Git with a tag (e.g., data: refresh Q4 2026), so any historical scorecard can be reproduced from the exact JSON that shipped. The per-query raw responses are archived in Azure Blob Storage and available to academic and regulatory parties on request (subject to LLM-provider terms of service).

What this methodology is NOT

  • Not a sentiment analysis — we measure where AI engines pull citations from, not whether the cited content is positive or negative.
  • Not a brand health score — net narrative control describes citation patterns, not commercial outcomes or consumer satisfaction.
  • Not exhaustive — 100 questions per brand cannot cover every possible query. Findings reflect a structured sample, not the full distribution of consumer queries about a brand.

Corrections + disputes

Each brand page carries a correction form that writes directly to our review queue. We respond within 5 business days. For larger disputes, contact legal@citedbrands.co.za — we'd rather correct a number than litigate one.