Name: LLM Engine Citation Benchmarks — Live Auto-Updating Dataset
Creator: AISearchStackHub
Published: 2026-05-01
License: https://creativecommons.org/licenses/by/4.0/

Live Engine Benchmarks

Computed from anonymized platform scan data. Each row represents aggregate performance across all completed scans on the platform.

Loading live benchmark data...

Engine	Avg AIS Score	Mention Rate	Avg Citations/Response	Total Results

Data source: AISearchStackHub proprietary scan database. Methodology: AIS Index (V×0.40 + A×0.30 + S×0.20 + Ad×0.10). Mention rate = fraction of scans where engine mentioned the brand. Data refreshed every hour.

Research Context: What Drives Citation Rate Differences?

The four major LLMs behave fundamentally differently when producing brand-adjacent responses. These differences stem from their underlying architectures, training data, and retrieval mechanisms — not from differences in brand quality or marketing investment.

Understanding which engine is most likely to cite your brand — and why — determines where to focus your AEO investment.

Perplexity Pro

Research baseline: 44/100 avg sub-score

Perplexity executes a real-time web search before generating every response. This produces the highest citation rate of any LLM — 87% of responses include named source citations in our research baseline. Brands with fresh, structured, crawlable content on high-authority domains benefit most.

Key signal: Domain authority + content recency. Perplexity favors sources with high external link counts and recently published dates.

Public source: Perplexity reported processing 10+ million daily queries as of early 2026 (TechCrunch, Feb 2026).

ChatGPT (GPT-4o)

Research baseline: 35/100 avg sub-score

ChatGPT with browsing uses Bing-backed web retrieval selectively. It cites sources in 68% of brand-adjacent responses but applies more editorial filtering than Perplexity — it tends to cite well-known review platforms (G2, Capterra, Trustpilot), official documentation, and news sources over generic blog content.

Key signal: Presence on third-party review sites + official documentation quality.

Public source: OpenAI reported 100M+ weekly active users as of late 2025 (OpenAI blog, Nov 2025).

Claude 3.5 Sonnet

Research baseline: 31/100 avg sub-score

Claude exhibits the highest sentiment accuracy in our study — it produces the most nuanced descriptions of brand positioning when it does cite brands. Its citation rate (61%) is lower than Perplexity but it more consistently represents brand positioning accurately when it does cite. Brands with published research, studies, or expert commentary score disproportionately well.

Key signal: Cited research, original data, expert credentials.

Public source: Anthropic reported Claude enterprise adoption across 50K+ companies as of Q1 2026 (Anthropic press release, Mar 2026).

Gemini 1.5 Pro

Research baseline: 28/100 avg sub-score

Gemini has the lowest citation rate in our study — 30% of responses include named sources — despite access to Google's search index. Gemini tends to synthesize information without attributing specific sources. Brands that are highly represented in pre-training data (Wikipedia, news coverage, Google Knowledge Graph) score better here than brands relying primarily on owned media.

Key signal: Wikipedia presence, Knowledge Graph entity, earned media coverage.

Public source: Google reported Gemini integration across 1B+ devices as of Google I/O 2025.

Research Baseline: 527-Brand Study Comparison

The following table represents the research baseline from our January-April 2026 curated study of 527 brands across 4 verticals. This is distinct from the live platform data above — the study used a controlled, representative sample; the live data represents all platform users who have run scans.

Engine	Study Avg Sub-Score	Citation Rate	Retrieval Method	Best Content Type
Perplexity Pro	44/100	87%	Real-time web search	Fresh structured content
ChatGPT (GPT-4o)	35/100	68%	Selective Bing browsing	Review sites + docs
Claude 3.5 Sonnet	31/100	61%	Web search (selective)	Research + authority data
Gemini 1.5 Pro	28/100	30%	Google Search integration	Wikipedia + KG presence

Source: AISearchStackHub curated research study, Jan-Apr 2026. 527 brands, 24 queries/brand, 4 LLM engines, 3 repetitions per query per engine. See full methodology at state-of-ai-search-report.

Methodology

Scoring: Each brand is issued 24 standardized queries across 4 categories (category queries, problem queries, comparison queries, authority queries). Responses are scored on the AIS Index: Visibility × 0.40 + Authority × 0.30 + Sentiment × 0.20 + Advantage × 0.10.

Per-engine scores: Each engine produces a sub-score on the same 0-100 scale. The overall AIS Index is the average of the 4 engine sub-scores, optionally weighted. The benchmarks on this page show per-engine averages across all scans.

Live data: The live platform data (top of this page) is computed from all completed scans by AISearchStackHub platform users. It is anonymized — no brand names, domains, or email addresses are included in aggregated outputs. It refreshes hourly via server-side caching.

Research baseline: The 527-brand curated study (second table) is a controlled representative sample collected January-April 2026 specifically for research purposes, using the same query protocol but with a controlled, diverse sample rather than organic platform users.

API access: The live engine benchmark data is available as a JSON endpoint at /api/research/engine-benchmarks. No API key required. Rate limit: 60 requests/hour per IP.

About This Dataset

This dataset is produced by AISearchStackHub's automated LLM visibility scanning infrastructure. The live data is anonymous aggregate data from the platform's scan database. The research baseline is from a curated study conducted January-April 2026.

Both datasets are made available under CC BY 4.0. You may use, share, and adapt this data with attribution to AISearchStackHub (aisearchstackhub.ai).

For questions about methodology or to request custom benchmark analysis, run a scan at aisearchstackhub.ai/scan or upgrade to the Scale plan for detailed per-engine reporting.

LLM Ranking Benchmarks 2026

Live Engine Benchmarks

Research Context: What Drives Citation Rate Differences?

Research Baseline: 527-Brand Study Comparison

Methodology

About This Dataset

See how your brand ranks across all 4 engines