Technical Guide
A practical, technical guide to implementing Schema.org markup that improves how LLMs extract, understand, and cite your content โ with copy-paste JSON-LD examples.
Updated May 2026
ยท
Technical
ยท
12 min read
<\!-- TOC -->
<\!-- Why LLMs Use Schema -->
Why LLMs Use Structured Data
Large language models process web content at scale during training and at retrieval time. When a model encounters a page, it attempts to extract: the page's primary topic, the entities involved (who, what, where), the relationships between those entities, and the specific claims or facts the page asserts. Unstructured prose makes this extraction computationally expensive and error-prone. Schema.org markup makes it explicit, machine-readable, and unambiguous.
๐งฉ
Cleaner Entity Extraction
Schema explicitly identifies who you are, what you make, and your relationships โ eliminating ambiguity that causes misattribution in LLM responses.
๐
Clearer Relationships
Schema properties like sameAs, parentOrganization, and brand establish the graph of relationships that helps LLMs build accurate entity models of your brand.
โ
Higher Extraction Fidelity
RAG systems that retrieve your content pass it as LLM context. Schema-marked content is extracted with greater accuracy, reducing the chance of hallucinated or incorrect citations.
A practical example: a product page with no schema might lead Claude to describe your product incorrectly because the extraction missed a key feature buried in the fourth paragraph. The same page with Product schema including a clear description, category, and offers ensures the model always has structured, accurate context when generating any response that involves your product.
<\!-- JSON-LD vs Microdata -->
JSON-LD vs Microdata: Always Use JSON-LD
Schema.org can be implemented in three formats: JSON-LD, Microdata, and RDFa. For LLM-targeted structured data, JSON-LD is the only format you should use, and here is why:
| Format |
Where it lives |
LLM parsing |
Recommendation |
| JSON-LD |
Separate <script> block in <head> |
Excellent โ clean, parseable, separate from rendering |
Use this |
| Microdata |
Inline HTML attributes |
Poor โ interleaved with HTML, extraction-heavy |
Avoid |
| RDFa |
Inline HTML attributes |
Poor โ verbose, complex, rarely processed by LLMs |
Avoid |
JSON-LD sits in a self-contained <script type="application/ld+json"> block in your page <head>. It requires no changes to your HTML markup, is easy to add and update, and is parsed cleanly by both search engines and LLM retrieval systems.
<\!-- Organization -->
Organization Schema
Organization schema is the most foundational schema type for AEO. It establishes your brand as a named entity in the LLM's world model โ answering "who are you, what do you do, and how do I verify this?" Every domain should have Organization schema on its homepage and in the site-wide footer or header template.
Critical properties for LLM entity recognition:
name โ your official brand name exactly as it should appear in AI answers
url โ canonical homepage URL
description โ 1โ2 sentence description of what you do
sameAs โ array of authoritative external profiles (LinkedIn, Twitter, Crunchbase, Wikipedia if applicable)
foundingDate โ helps models place you in temporal context
numberOfEmployees โ entity disambiguation signal
JSON-LD: Organization
{
"@context": "https://schema.org",
"@type": "Organization",
"name": "Your Company Name",
"url": "https://yourcompany.com",
"logo": "https://yourcompany.com/logo.png",
"description": "We build software that helps marketing teams track AI search visibility.",
"foundingDate": "2023",
"numberOfEmployees": {
"@type": "QuantitativeValue",
"value": 25
},
"address": {
"@type": "PostalAddress",
"addressLocality": "San Francisco",
"addressRegion": "CA",
"addressCountry": "US"
},
"sameAs": [
"https://linkedin.com/company/your-company",
"https://twitter.com/yourcompany",
"https://www.crunchbase.com/organization/your-company",
"https://g2.com/products/your-product"
],
"contactPoint": {
"@type": "ContactPoint",
"contactType": "customer support",
"email": "support@yourcompany.com"
}
}
<\!-- Article -->
Article Schema
Article schema should be on every blog post, guide, and informational page. It signals that this page is a piece of substantive content with an author and publication date โ two key signals for LLM citation authority. The dateModified property is particularly critical for RAG-based engines like Perplexity that weight freshness.
JSON-LD: Article
{
"@context": "https://schema.org",
"@type": "Article",
"headline": "Your Article Title Here",
"description": "A concise 1-2 sentence summary of the article's content.",
"image": "https://yourcompany.com/images/article-og.png",
"datePublished": "2026-01-15",
"dateModified": "2026-05-01",
"author": {
"@type": "Person",
"name": "Author Name",
"url": "https://yourcompany.com/team/author-name"
},
"publisher": {
"@type": "Organization",
"name": "Your Company Name",
"logo": {
"@type": "ImageObject",
"url": "https://yourcompany.com/logo.png"
}
},
"mainEntityOfPage": {
"@type": "WebPage",
"@id": "https://yourcompany.com/blog/article-slug"
},
"wordCount": 1800,
"keywords": ["keyword one", "keyword two", "keyword three"]
}
<\!-- FAQ -->
FAQ Schema
FAQ schema is the single highest-impact schema type for LLM citation optimization. When you mark up your FAQ content with structured question-answer pairs, LLMs can extract and synthesize those answers with very high fidelity โ they effectively have pre-formatted answer snippets ready to incorporate into a response. Perplexity in particular reproduces FAQ schema content almost verbatim when answering matching user queries.
Implementation rules:
- Only mark up questions that appear as visible HTML content on the page
- Keep answers concise but complete (50โ200 words per answer is ideal)
- Use question phrasing that matches how users actually ask (conversational, not keyword-stuffed)
- Include 4โ8 questions per page โ more dilutes signal, fewer wastes opportunity
JSON-LD: FAQPage
{
"@context": "https://schema.org",
"@type": "FAQPage",
"mainEntity": [
{
"@type": "Question",
"name": "What is answer engine optimization?",
"acceptedAnswer": {
"@type": "Answer",
"text": "Answer engine optimization (AEO) is the practice of structuring and optimizing content so it is cited and surfaced in AI-generated answers from systems like ChatGPT, Claude, Perplexity, and Gemini. Unlike traditional SEO which targets ranked links, AEO targets the generated answer itself."
}
},
{
"@type": "Question",
"name": "How long does AEO take to show results?",
"acceptedAnswer": {
"@type": "Answer",
"text": "For retrieval-augmented engines like Perplexity, you can see citation rate improvements within 4-8 weeks of publishing optimized content. For parametric model responses from ChatGPT and Claude, improvements depend on training update cycles and typically take 3-12 months."
}
}
]
}
<\!-- HowTo -->
HowTo Schema
HowTo schema is ideal for step-by-step instructional content. It explicitly communicates the procedural nature of the content, the tools required, and each discrete step โ making it easy for LLMs to summarize the procedure accurately when answering "how do I..." queries. HowTo schema pairs naturally with tutorial pages, setup guides, and process documentation.
JSON-LD: HowTo
{
"@context": "https://schema.org",
"@type": "HowTo",
"name": "How to Improve Your AI Search Visibility",
"description": "A step-by-step guide to improving your AIS Index score across ChatGPT, Claude, Perplexity, and Gemini.",
"totalTime": "PT2H",
"estimatedCost": {
"@type": "MonetaryAmount",
"currency": "USD",
"value": "0"
},
"tool": [
{ "@type": "HowToTool", "name": "AISearchStackHub scanner" },
{ "@type": "HowToTool", "name": "Google Search Console" },
{ "@type": "HowToTool", "name": "Schema.org Validator" }
],
"step": [
{
"@type": "HowToStep",
"name": "Run a baseline AIS scan",
"text": "Visit aisearchstackhub.ai/scan and enter your domain. The scanner will measure your current citation rate across 4 LLM engines and return an AIS Index score.",
"url": "https://aisearchstackhub.ai/scan"
},
{
"@type": "HowToStep",
"name": "Identify your top citation gaps",
"text": "Review the gap analysis in your scan report. Note the top 3 topic areas where you are not being cited but should be."
},
{
"@type": "HowToStep",
"name": "Add FAQ schema to key pages",
"text": "Add FAQPage JSON-LD to your 5 most important informational pages. Include 4-6 specific questions with complete answers."
},
{
"@type": "HowToStep",
"name": "Rescan and measure improvement",
"text": "Re-run your AIS scan after 4-6 weeks to measure the citation rate improvement from your schema additions."
}
]
}
<\!-- Product -->
Product Schema
Product schema is essential for any e-commerce or SaaS product page. It provides LLMs with structured data about what you sell, its features, pricing, and reviews โ ensuring that when an LLM is asked to recommend products in your category, it has accurate, complete information about yours. The aggregateRating and offers properties are particularly important for product comparison queries.
JSON-LD: Product
{
"@context": "https://schema.org",
"@type": "Product",
"name": "Your Product Name",
"image": "https://yourcompany.com/product-image.png",
"description": "Clear, factual description of what your product does and who it is for.",
"brand": {
"@type": "Brand",
"name": "Your Company Name"
},
"category": "Software",
"offers": {
"@type": "Offer",
"price": "299",
"priceCurrency": "USD",
"priceValidUntil": "2026-12-31",
"availability": "https://schema.org/InStock",
"url": "https://yourcompany.com/pricing"
},
"aggregateRating": {
"@type": "AggregateRating",
"ratingValue": "4.7",
"reviewCount": "142",
"bestRating": "5"
},
"featureList": [
"Multi-engine LLM scanning",
"AIS Index scoring",
"Citation asset generation",
"Monthly tracking"
]
}
<\!-- Dataset -->
Dataset Schema
Dataset schema is an underused but high-impact schema type for AEO. LLMs are heavily queried for statistical data and research findings. When you publish original research, industry benchmarks, or data collections, Dataset schema signals to LLMs that this page is a primary source of quantitative data โ significantly increasing the likelihood it gets cited for data-related queries. This is one of the highest-leverage schema investments for companies with access to original data.
JSON-LD: Dataset
{
"@context": "https://schema.org",
"@type": "Dataset",
"name": "AI Search Citation Rate Benchmarks 2026",
"description": "Citation rate data across ChatGPT, Claude, Perplexity, and Gemini from analysis of 50,000 queries in Q1 2026.",
"url": "https://yourcompany.com/research/ai-citation-benchmarks-2026",
"creator": {
"@type": "Organization",
"name": "Your Company Research Team",
"url": "https://yourcompany.com"
},
"datePublished": "2026-03-01",
"dateModified": "2026-05-01",
"license": "https://creativecommons.org/licenses/by/4.0/",
"measurementTechnique": "Direct API querying of LLM engines with structured query sampling",
"variableMeasured": "Citation rate per engine per query category",
"temporalCoverage": "2026-01-01/2026-03-31",
"spatialCoverage": "Global",
"distribution": [
{
"@type": "DataDownload",
"encodingFormat": "CSV",
"contentUrl": "https://yourcompany.com/data/citation-benchmarks-2026.csv"
}
]
}
<\!-- Testing -->
Testing and Validating Your Schema
Implementing schema without validation is error-prone โ a single malformed JSON property can invalidate the entire block. Use these tools to validate before deploying:
Google Rich Results Test
The primary validation tool for Schema.org implementation. Enter any URL or paste your JSON-LD directly. It will show you which schema types were detected, whether they're valid for Rich Results, and any errors or warnings.
search.google.com/test/rich-results
Schema.org Validator
The official Schema.org validator checks your markup against the full schema.org vocabulary โ including properties not covered by Google's Rich Results Test. Use this for Dataset, HowTo, and other schema types that may not have Rich Results eligibility.
validator.schema.org
Manual LLM Accuracy Test
After implementing schema, query Perplexity directly: "What does [your company] do?" and "Tell me about [your product]." Compare the response to your Organization and Product schema โ are the descriptions accurate? Are attributes correct? If not, review your schema descriptions for clarity and specificity.
Common mistake: Implementing schema in a JavaScript-rendered component that is not present in the initial HTML response. LLM crawlers often do not execute JavaScript โ your schema must be in the server-rendered HTML. Verify this by viewing your page's source HTML (Ctrl+U) and confirming the JSON-LD script block is present, not just visible in the browser's DOM inspector.
<\!-- Citation Correlation -->
Which Schemas Get Cited Most in LLMs
Based on analysis of citation patterns across AIS scans, here is the ranked impact of schema types on LLM citation rates:
FAQPage
Highest citation rate. FAQ content is reproduced verbatim in LLM answers at very high frequency.
~+35% citation lift
Organization
Entity recognition anchor. Without it, your brand may be conflated with similarly-named entities or described incorrectly.
Entity accuracy
Article + dateModified
Critical for Perplexity's recency ranking. Pages without clear date signals lose citation priority over time.
~+20% citation lift
HowTo
Step-by-step queries are a major query category for AI search. HowTo schema ensures your procedure is extracted and presented accurately.
~+18% citation lift
Dataset
Signals primary data source status. Underused but high-impact for organizations with original research.
~+15% for data queries
Product
Improves accuracy of product descriptions and pricing in LLM responses. Important for e-commerce and SaaS.
Accuracy improvement
Note: Citation lift percentages are estimates based on before/after AIS scan comparisons across a sample of domains. Individual results vary depending on domain authority, query category, and content quality.
<\!-- CTA -->
See How Your Current Schema Affects Your AIS Score
Our free scan measures your AI visibility across all four major LLM engines and identifies specific schema and content gaps holding back your citation rate.
Run Free AIS Scan
<\!-- FAQ Section -->
Frequently Asked Questions
Does schema guarantee citation in LLMs?
No โ schema improves extraction accuracy and citation probability, but doesn't guarantee citation. LLMs select citations based on relevance to the query, domain authority, content quality, and recency. Schema removes friction in the extraction process and makes your content more citable, but the content itself must still be the best answer to the query.
Can I have multiple schema types on one page?
Yes โ and you should. A blog post should have both Article schema (for content metadata) and FAQPage schema (for any FAQ section). A product page might have Product schema, FAQPage for common questions, and HowTo for setup instructions. Use separate <script type="application/ld+json"> blocks for each type.
Does Google Rich Results Test cover all schema types?
No โ Google's Rich Results Test only validates schema types that are eligible for Google's rich result features (FAQ, HowTo, Product, Recipe, etc.). Schema types like Dataset, Organization, and custom types may not show in the Rich Results Test even when correctly implemented. Always cross-validate with Schema.org Validator for full coverage.
How often should I update my schema?
Organization schema should be updated whenever your company information changes (team size, funding, new products). Article schema should update dateModified whenever you make substantive content updates. Product schema should reflect current pricing and availability. FAQ schema should expand as you identify new user questions from search data.
Does schema help with both Perplexity and ChatGPT?
Yes, but through different mechanisms. For Perplexity (RAG-based), schema directly improves real-time extraction quality when your page is retrieved as a source. For ChatGPT's parametric responses, schema improves how your content was ingested during training data processing. The impact on Perplexity is faster and more measurable; the impact on ChatGPT is slower but persistent across model versions.
<\!-- Footer -->