Skip to main content
Data visualization showing how AI Overviews select and cite sources, based on the ThothGraph research study.

ThothGraph Research: We Analyzed 10,000 AI Overviews. Here’s the Anatomy of a Cited Source.

What does it take to get cited in Google's AI Overviews? ThothGraph's new research analyzes 10,000 AI-generated answers to reveal the data-backed anatomy of a citable source for SEOs and content marketers.

The ThothGraph Research Team Updated 8 min read
SEOSGEAI OverviewsContent StrategyData Study

The New SERP: From Ranking to Citation

The age of AI-driven search is here, and the SERP is fundamentally changing. For a decade, the goal was ranking #1. Now, the new prime real estate is the AI Overview, and the new goal is getting cited as a trusted source.

But what makes a source citable to a large language model? Gut feelings and old SEO advice won't cut it. To find the answer, we leveraged the ThothGraph data engine to analyze 10,000 unique AI Overviews across B2B SaaS verticals.

This report deconstructs the anatomy of a cited source, providing a data-backed framework for Answer Engine Optimization (AEO) and earning high-value brand mentions in AI-generated answers.

Our Methodology

To ensure our findings are robust and actionable, we established a clear methodology. Our analysis focused on informational and commercial-investigation queries where AI Overviews are most prevalent.

Our process involved:

  • Data Set: A collection of 10,000 unique, non-navigational search queries.
  • Source Analysis: We programmatically analyzed the top 3-5 cited URLs for each AI Overview.
  • Feature Extraction: We examined over 50 on-page and off-page features, including content structure, schema markup, factual density, and E-E-A-T signals.

Finding 1: Factual Density is Paramount

Our analysis revealed a clear pattern: AI models prioritize verifiable information. Pages that simply offered opinions or generic advice were rarely cited. The data speaks for itself.

We found that 78% of all cited sources contained specific, quantifiable data points, such as statistics, percentages, or years. Content backed by original research, surveys, or aggregated industry data performed exceptionally well.

This indicates that the AI is trained to identify and surface information that can be corroborated. To get cited, your content must be a source of truth, not just a source of opinion.

Finding 2: Structure is a Signal of Clarity

How you structure your content is a direct signal of its utility to a machine. Pages with a clear, logical hierarchy were 2.1x more likely to be cited than pages with long, unbroken blocks of text.

Key structural elements of cited sources include:

  • Granular Headings: Use of H2s and H3s to break down complex topics into discrete concepts that directly answer sub-questions.
  • Bulleted and Numbered Lists: 65% of cited sources used lists to summarize key takeaways, steps, or features, making information easy to parse and synthesize.
  • Schema Markup: Pages using FAQPage, HowTo, or Article schema with detailed properties were cited more frequently. Schema acts as a content roadmap for the AI.

Finding 3: The Rise of the 'Definitive Answer' Page

We identified a consistent archetype among top-performing cited pages, which we call the 'Definitive Answer' page. It's not just a blog post; it's a comprehensive resource designed to be the final word on a specific topic.

These pages share several traits:

  • High Topical Authority: The content is exhaustive and supported by strong internal linking to related pillar and spoke pages.
  • Clear E-E-A-T: Prominent author bios, links to source data, and clear 'About Us' information build trust with both users and AI.
  • Concise Language: The writing is direct and unambiguous. Sentences are structured as clear statements of fact, avoiding fluff and promotional language.

Your Framework for Getting Cited in AI Overviews

Based on our analysis, here is a tactical framework for creating content that earns AI citations. This is Answer Engine Optimization in practice.

Step 1: Target Questions, Not Just Keywords

Shift your research from broad keywords to the specific questions your audience is asking. Use tools to find 'People Also Ask' and related queries that signal an informational need.

Step 2: Embed Verifiable Facts

  • Incorporate original research or proprietary data whenever possible.
  • Cite credible third-party studies and link to them.
  • Update your content regularly with the latest statistics and data points.

Step 3: Engineer for Synthesis

  • Structure your content with a 'question-and-answer' format using H2s and H3s.
  • Use bulleted lists to summarize key features, benefits, or steps.
  • Create data tables to present structured information clearly.

Step 4: Implement Strategic Schema

  • Go beyond basic Article schema. Use FAQPage for Q&A sections and HowTo for instructional content.
  • Ensure your schema is complete and accurately reflects the content on the page.

Conclusion: Become the Source

The paradigm has shifted. Success in an AI-first search landscape is less about ranking for a keyword and more about becoming the definitive, citable source for an entire topic.

By focusing on factual density, logical structure, and demonstrable expertise, you can position your content to be the foundation of AI-generated answers, driving brand visibility and authority in a new era of search.

FAQ

What's the single most important factor for getting cited in an AI Overview?

Our data shows the most critical factor is factual density. Content that includes specific, verifiable data points like statistics, dates, or figures is significantly more likely to be cited.

Do I need to be a big brand like Forbes or Wikipedia to get cited?

No. While established authority helps, our research found that niche sites with deep, fact-based, and well-structured content are frequently cited for their specific area of expertise.

Is it better to create new content or update old content for AI Overviews?

Both are viable, but updating existing high-authority pages can be a quick win. Enhance them with new data, better structure using lists and H2s, and add relevant schema markup.

How does E-E-A-T impact AI Overview citations?

E-E-A-T acts as a trust signal for the AI. Pages with clear author information, links to source data, and a strong topical focus were consistently favored as citable sources in our analysis.

Continue reading

Ready to audit your AI visibility?

Run a free SEO + AEO + GEO assessment and see how your site performs across Google, ChatGPT, Gemini, and Perplexity.