
ThothGraph Research: We Analyzed 10,000 AI Overviews. Here’s the Anatomy of a Cited Source.
What does it take to get cited in Google's AI Overviews? ThothGraph's new research analyzes 10,000 AI-generated answers to reveal the data-backed anatomy of a citable source for SEOs and content marketers.
The New SERP: From Ranking to Citation
The age of AI-driven search is here, and the SERP is fundamentally changing. For a decade, the goal was ranking #1. Now, the new prime real estate is the AI Overview, and the new goal is getting cited as a trusted source.
But what makes a source citable to a large language model? Gut feelings and old SEO advice won't cut it. To find the answer, we leveraged the ThothGraph data engine to analyze 10,000 unique AI Overviews across B2B SaaS verticals.
This report deconstructs the anatomy of a cited source, providing a data-backed framework for Answer Engine Optimization (AEO) and earning high-value brand mentions in AI-generated answers.
Our Methodology
To ensure our findings are robust and actionable, we established a clear methodology. Our analysis focused on informational and commercial-investigation queries where AI Overviews are most prevalent.
Our process involved:
- Data Set: A collection of 10,000 unique, non-navigational search queries.
- Source Analysis: We programmatically analyzed the top 3-5 cited URLs for each AI Overview.
- Feature Extraction: We examined over 50 on-page and off-page features, including content structure, schema markup, factual density, and E-E-A-T signals.
Finding 1: Factual Density is Paramount
Our analysis revealed a clear pattern: AI models prioritize verifiable information. Pages that simply offered opinions or generic advice were rarely cited. The data speaks for itself.
We found that 78% of all cited sources contained specific, quantifiable data points, such as statistics, percentages, or years. Content backed by original research, surveys, or aggregated industry data performed exceptionally well.
This indicates that the AI is trained to identify and surface information that can be corroborated. To get cited, your content must be a source of truth, not just a source of opinion.
Finding 2: Structure is a Signal of Clarity
How you structure your content is a direct signal of its utility to a machine. Pages with a clear, logical hierarchy were 2.1x more likely to be cited than pages with long, unbroken blocks of text.
Key structural elements of cited sources include:
- Granular Headings: Use of H2s and H3s to break down complex topics into discrete concepts that directly answer sub-questions.
- Bulleted and Numbered Lists: 65% of cited sources used lists to summarize key takeaways, steps, or features, making information easy to parse and synthesize.
- Schema Markup: Pages using
FAQPage,HowTo, orArticleschema with detailed properties were cited more frequently. Schema acts as a content roadmap for the AI.
Finding 3: The Rise of the 'Definitive Answer' Page
We identified a consistent archetype among top-performing cited pages, which we call the 'Definitive Answer' page. It's not just a blog post; it's a comprehensive resource designed to be the final word on a specific topic.
These pages share several traits:
- High Topical Authority: The content is exhaustive and supported by strong internal linking to related pillar and spoke pages.
- Clear E-E-A-T: Prominent author bios, links to source data, and clear 'About Us' information build trust with both users and AI.
- Concise Language: The writing is direct and unambiguous. Sentences are structured as clear statements of fact, avoiding fluff and promotional language.
Your Framework for Getting Cited in AI Overviews
Based on our analysis, here is a tactical framework for creating content that earns AI citations. This is Answer Engine Optimization in practice.
Step 1: Target Questions, Not Just Keywords
Shift your research from broad keywords to the specific questions your audience is asking. Use tools to find 'People Also Ask' and related queries that signal an informational need.
Step 2: Embed Verifiable Facts
- Incorporate original research or proprietary data whenever possible.
- Cite credible third-party studies and link to them.
- Update your content regularly with the latest statistics and data points.
Step 3: Engineer for Synthesis
- Structure your content with a 'question-and-answer' format using H2s and H3s.
- Use bulleted lists to summarize key features, benefits, or steps.
- Create data tables to present structured information clearly.
Step 4: Implement Strategic Schema
- Go beyond basic
Articleschema. UseFAQPagefor Q&A sections andHowTofor instructional content. - Ensure your schema is complete and accurately reflects the content on the page.
Conclusion: Become the Source
The paradigm has shifted. Success in an AI-first search landscape is less about ranking for a keyword and more about becoming the definitive, citable source for an entire topic.
By focusing on factual density, logical structure, and demonstrable expertise, you can position your content to be the foundation of AI-generated answers, driving brand visibility and authority in a new era of search.
FAQ
What's the single most important factor for getting cited in an AI Overview?
Our data shows the most critical factor is factual density. Content that includes specific, verifiable data points like statistics, dates, or figures is significantly more likely to be cited.
Do I need to be a big brand like Forbes or Wikipedia to get cited?
No. While established authority helps, our research found that niche sites with deep, fact-based, and well-structured content are frequently cited for their specific area of expertise.
Is it better to create new content or update old content for AI Overviews?
Both are viable, but updating existing high-authority pages can be a quick win. Enhance them with new data, better structure using lists and H2s, and add relevant schema markup.
How does E-E-A-T impact AI Overview citations?
E-E-A-T acts as a trust signal for the AI. Pages with clear author information, links to source data, and a strong topical focus were consistently favored as citable sources in our analysis.
Continue reading
Insights
SEO in 2025: Adapting Your Strategy for the AI Search Revolution
Traditional SEO is facing its biggest shift yet. As AI-powered search like Google's SGE reshapes the SERP, marketing leaders must evolve from optimizing for clicks to optimizing for inclusion in AI-generated answers. This guide outlines the strategic pivot required for 2025.
Insights
Answer Engine Optimization: How to Get Your Brand Cited by AI
As AI assistants like ChatGPT, Gemini, and Copilot become the new search interface, getting your brand cited is the new ranking. Learn the core principles of Answer Engine Optimization (AEO) to stay visible in a zero-click world.
Insights
Generative Engine Optimization (GEO): The New Frontier for Brand Visibility in AI Search
Traditional SEO is evolving. Discover Generative Engine Optimization (GEO), the critical practice of optimizing your content to appear in AI-powered search results and answer engines like ChatGPT and Perplexity.
Ready to audit your AI visibility?
Run a free SEO + AEO + GEO assessment and see how your site performs across Google, ChatGPT, Gemini, and Perplexity.