Live Engine Benchmarks
Computed from anonymized platform scan data. Each row represents aggregate performance across all completed scans on the platform.
| Engine | Avg AIS Score | Mention Rate | Avg Citations/Response | Total Results |
|---|
Data source: AISearchStackHub proprietary scan database. Methodology: AIS Index (V×0.40 + A×0.30 + S×0.20 + Ad×0.10). Mention rate = fraction of scans where engine mentioned the brand. Data refreshed every hour.
Research Context: What Drives Citation Rate Differences?
The four major LLMs behave fundamentally differently when producing brand-adjacent responses. These differences stem from their underlying architectures, training data, and retrieval mechanisms — not from differences in brand quality or marketing investment.
Understanding which engine is most likely to cite your brand — and why — determines where to focus your AEO investment.
Perplexity executes a real-time web search before generating every response. This produces the highest citation rate of any LLM — 87% of responses include named source citations in our research baseline. Brands with fresh, structured, crawlable content on high-authority domains benefit most.
Public source: Perplexity reported processing 10+ million daily queries as of early 2026 (TechCrunch, Feb 2026).
ChatGPT with browsing uses Bing-backed web retrieval selectively. It cites sources in 68% of brand-adjacent responses but applies more editorial filtering than Perplexity — it tends to cite well-known review platforms (G2, Capterra, Trustpilot), official documentation, and news sources over generic blog content.
Public source: OpenAI reported 100M+ weekly active users as of late 2025 (OpenAI blog, Nov 2025).
Claude exhibits the highest sentiment accuracy in our study — it produces the most nuanced descriptions of brand positioning when it does cite brands. Its citation rate (61%) is lower than Perplexity but it more consistently represents brand positioning accurately when it does cite. Brands with published research, studies, or expert commentary score disproportionately well.
Public source: Anthropic reported Claude enterprise adoption across 50K+ companies as of Q1 2026 (Anthropic press release, Mar 2026).
Gemini has the lowest citation rate in our study — 30% of responses include named sources — despite access to Google's search index. Gemini tends to synthesize information without attributing specific sources. Brands that are highly represented in pre-training data (Wikipedia, news coverage, Google Knowledge Graph) score better here than brands relying primarily on owned media.
Public source: Google reported Gemini integration across 1B+ devices as of Google I/O 2025.
Research Baseline: 527-Brand Study Comparison
The following table represents the research baseline from our January-April 2026 curated study of 527 brands across 4 verticals. This is distinct from the live platform data above — the study used a controlled, representative sample; the live data represents all platform users who have run scans.
| Engine | Study Avg Sub-Score | Citation Rate | Retrieval Method | Best Content Type |
|---|---|---|---|---|
| Perplexity Pro | 44/100 | 87% | Real-time web search | Fresh structured content |
| ChatGPT (GPT-4o) | 35/100 | 68% | Selective Bing browsing | Review sites + docs |
| Claude 3.5 Sonnet | 31/100 | 61% | Web search (selective) | Research + authority data |
| Gemini 1.5 Pro | 28/100 | 30% | Google Search integration | Wikipedia + KG presence |
Source: AISearchStackHub curated research study, Jan-Apr 2026. 527 brands, 24 queries/brand, 4 LLM engines, 3 repetitions per query per engine. See full methodology at state-of-ai-search-report.
Methodology
Scoring: Each brand is issued 24 standardized queries across 4 categories (category queries, problem queries, comparison queries, authority queries). Responses are scored on the AIS Index: Visibility × 0.40 + Authority × 0.30 + Sentiment × 0.20 + Advantage × 0.10.
Per-engine scores: Each engine produces a sub-score on the same 0-100 scale. The overall AIS Index is the average of the 4 engine sub-scores, optionally weighted. The benchmarks on this page show per-engine averages across all scans.
Live data: The live platform data (top of this page) is computed from all completed scans by AISearchStackHub platform users. It is anonymized — no brand names, domains, or email addresses are included in aggregated outputs. It refreshes hourly via server-side caching.
Research baseline: The 527-brand curated study (second table) is a controlled representative sample collected January-April 2026 specifically for research purposes, using the same query protocol but with a controlled, diverse sample rather than organic platform users.
API access: The live engine benchmark data is available as a JSON endpoint at /api/research/engine-benchmarks. No API key required. Rate limit: 60 requests/hour per IP.
About This Dataset
This dataset is produced by AISearchStackHub's automated LLM visibility scanning infrastructure. The live data is anonymous aggregate data from the platform's scan database. The research baseline is from a curated study conducted January-April 2026.
Both datasets are made available under CC BY 4.0. You may use, share, and adapt this data with attribution to AISearchStackHub (aisearchstackhub.ai).
For questions about methodology or to request custom benchmark analysis, run a scan at aisearchstackhub.ai/scan or upgrade to the Scale plan for detailed per-engine reporting.