SA-AEO-Bench v1 · research

All documents from SA-AEO-Bench v1 — pre-registered, reproducible, audit-ready.

188,877 citations across 100 brands in 10 industries and 3 frontier LLMs. Snapshot at 99% data completion, 2026-05-19.

All five documents below were published as the study ran. The pre-registration (osf.io/w4az2) and the protocol PDF were submitted before any LLM was queried. The interim brief, the pre-completion report, and the insights deliverable followed in order as the run progressed.

Documents

Download. Read. Reproduce.

01
OSF pre-registration
SA-AEO-Bench v1 · Open Science Framework form fields
Hypotheses H1–H7, prompt set, scoring rubric, analysis plan, budget ceiling. Submitted before any LLM was queried.
docs/research/sa-aeo-bench-v1-osf-formfields.md
View on OSF →
02
Protocol
SA-AEO-Bench v1 · Pre-registration protocol
Full methodology: brand sample, query construction, Latin Square debiasing, Bradley-Terry strength estimation, sycophancy correction.
docs/research/sa-aeo-bench-v1-osf-protocol.pdf
Download PDF →
03
Interim brief
SA-AEO-Bench v1 · Interim status brief
Mid-run progress + early signal. For stakeholders following the run live.
docs/research/sa-aeo-bench-v1-interim-brief.pdf
Download PDF →
04
Pre-completion · methodology
SA-AEO-Bench v1 · Pre-completion report (formal)
Run status, cost, per-model summary, all seven H1–H7 hypothesis verdicts. Audit-grade.
docs/research/sa-aeo-bench-v1-precompletion-report.pdf
Download PDF →
05
Pre-completion · insights
SA-AEO-Bench v1 · The Actual Insights
Per-brand findings, industry deep-dives, leaderboards. Stakeholder-compelling. The headline document.
docs/research/sa-aeo-bench-v1-insights.pdf
Download PDF →

Replicate

The protocol is public. The data is reproducible.

Three things make any AI-search-citation benchmark trustworthy: pre-registration of hypotheses, public methodology, and reproducible raw data. SA-AEO-Bench v1 ships all three. To replicate:

Read the protocol PDF (link above). Reproduce the prompt set and the scoring rubric.
Run against your own API keys for GPT-5, Claude Sonnet 4.5, and Gemini 2.5 Pro. Budget ≈ R25,000 for full coverage.
Apply Latin Square debiasing on comparison queries and Bradley-Terry strength estimation on the per-brand wins. Both are in the protocol.
Compare your results to the pre-completion report. Diverging numbers are themselves a finding — they isolate methodology variance from underlying signal.

Email research@citedbrands.co.za if you’d like the raw JSONL records (69MB, ~190k citation rows) for academic replication. Free under attribution.

Run the next one with us

Subscribe and get every quarterly bench drop the day it ships.

Subscribe →

All documents from SA-AEO-Bench v1 — pre-registered, reproducible, audit-ready.

Download. Read. Reproduce.

SA-AEO-Bench v1 · Open Science Framework form fields

SA-AEO-Bench v1 · Pre-registration protocol

SA-AEO-Bench v1 · Interim status brief

SA-AEO-Bench v1 · Pre-completion report (formal)

SA-AEO-Bench v1 · The Actual Insights

The protocol is public. The data is reproducible.

Subscribe and get every quarterly bench drop the day it ships.