Reference · 107 terms

The AEO Glossary

A plain-language encyclopedia of answer-engine optimization. Every term is defined answer-first — the one sentence you could quote — then expanded with an example and linked to the relevant Canon pillar. Search or jump by letter.

A B C D E F G H I J K L M N O P Q R S TVZ

107 terms

A

Agentic Search: Agentic search is when an AI system autonomously plans and runs multiple search and reasoning steps to answer a complex question, rather than retrieving once and replying.
AI Referral Traffic: AI referral traffic is the visitors who reach your site by clicking a citation in an AI answer, a small but high-intent stream you can measure in analytics by its source domains.
Answer Box: An answer box is a search result that displays a direct answer to a query at the top of the page, an early form of the zero-click answer that AI Overviews have since expanded.
Answer Engine: An answer engine is a search system that responds to a question with a direct, synthesized answer instead of a list of links, usually citing the sources it drew from.
Applebot-Extended: Applebot-Extended is a robots.txt control that lets you opt out of having your content used to train Apple's AI models, without affecting Siri or Spotlight indexing by the regular Applebot.
Approximate Nearest Neighbor (ANN): Approximate nearest neighbor is a family of algorithms that quickly find the embeddings most similar to a query without checking every item, making large-scale semantic search fast enough to be practical.
Article Schema: Article schema is structured data that identifies a page as an article and specifies its headline, author, and publish and update dates, helping engines attribute and date your content correctly.
Attention: Attention is the mechanism that lets a language model weigh which words in the input matter most for understanding each part, and it tends to concentrate on prominent, early, and clearly-related text.

B

Bingbot: Bingbot is Microsoft's web crawler that builds the Bing search index, which also powers Microsoft Copilot, so blocking it removes you from both Bing and Copilot answers.
BM25: BM25 is a classic ranking function that scores how well a document matches a query based on term frequency and rarity, and it remains a strong, widely-used retrieval baseline.
Brand Authority: Brand authority is the overall trust and recognition your brand has earned across the web, the strongest correlate of AI visibility and the thing engines lean on when choosing whom to cite.
Branded Mention: A branded mention is any reference to your brand name across the web, linked or not, that helps AI systems recognize you as a known entity and weigh how often and how favorably you're discussed.
Bytespider: Bytespider is ByteDance's web crawler, associated with gathering training data for its AI models, and is often noted for aggressive crawling that some sites rate-limit or block.

C

CCBot: CCBot is the crawler operated by Common Crawl, whose open dataset is a foundational training source for many AI models, so allowing or blocking it shapes how widely your content trains AI.
ChatGPT: ChatGPT is OpenAI's conversational AI assistant that, with search enabled, answers questions by retrieving and citing live web sources, making it one of the most important answer engines for AEO.
Chunking: Chunking is how a retrieval system splits your page into smaller passages before indexing it, so AI engines retrieve and cite chunks of a page rather than the whole document.
Citation: A citation in AI search is when an answer engine credits your page as a source for its response, usually as a linked reference, making it the surviving path to your site in a zero-click answer.
Citation Share: Citation share is the percentage of AI answers to your target questions in which your site is specifically cited as a source, the strictest measure of whether you're winning the citation, not just being mentioned.
Citation Volatility: Citation volatility is the tendency of AI citations to change frequently over time and between runs, so the sources cited for a question shift often even when nothing about your page changed.
Claude: Claude is Anthropic's family of AI assistants that answer conversationally and, with web access enabled, can retrieve and cite live sources in their responses.
ClaudeBot: ClaudeBot is Anthropic's web crawler that collects content used to train its Claude models, identified by the ClaudeBot user-agent and controllable via robots.txt.
Client-Side Rendering (CSR): Client-side rendering is when a page ships minimal HTML and builds its content in the browser with JavaScript, which can hide that content from AI crawlers that don't execute scripts.
Common Crawl: Common Crawl is a nonprofit that publishes a free, massive archive of crawled web pages, which has served as a foundational training dataset for many large language models.
Content Freshness: Content freshness is how recently and actively your content has been updated, a signal AI engines weigh because they favor current information and rotate stale sources out of answers.
Context Window: The context window is the maximum amount of text an AI model can consider at once, including the question, any retrieved sources, and its own answer, measured in tokens.
Core Web Vitals: Core Web Vitals are Google's set of user-experience metrics for loading, interactivity, and visual stability, a measurable proxy for the page speed and quality that support AI visibility.
Corroboration: Corroboration is when multiple independent, reputable sources agree on a claim about you, giving AI systems the confidence to treat it as fact and repeat it in answers.
Cosine Similarity: Cosine similarity is a math measure of how alike two embeddings are based on the angle between them, and is the common way retrieval systems score how relevant a passage is to a query.
Crawlability: Crawlability is whether automated crawlers can reach and read your pages, the absolute prerequisite for being indexed and cited, since content a crawler can't fetch is invisible to AI.
Cross-Encoder: A cross-encoder is a model that judges relevance by reading a query and a passage together, used in the reranking step to precisely reorder retrieved candidates before the answer is written.
Cumulative Layout Shift (CLS): Cumulative Layout Shift measures how much a page's content unexpectedly moves around as it loads, a Core Web Vital capturing visual stability.

D

Dense Retrieval: Dense retrieval finds relevant passages by comparing the meaning-based vector embeddings of a query and your content, matching on semantics rather than exact words.
Disambiguation: Disambiguation is how AI systems decide which specific entity a name refers to when several share it, and giving clear context helps ensure they connect mentions to the right you.

E

E-E-A-T: E-E-A-T stands for Experience, Expertise, Authoritativeness, and Trustworthiness — Google's framework for judging content quality, and a useful proxy for the credibility signals AI engines reward.
Embeddings: Embeddings are numerical representations of text that capture its meaning, letting AI systems find passages that are semantically related to a query even when they share no exact keywords.
Entity: An entity is a distinct, identifiable thing — a person, company, product, or place — that AI systems recognize and reason about as a single, consistent node rather than as loose strings of text.
Entity Salience: Entity salience is how central and prominent an entity is within a piece of content, signaling to AI systems what the content is really about and which entities it most concerns.

F

FAQPage Schema: FAQPage schema is structured data that marks a list of questions and answers on a page, making your Q&A content explicit and machine-readable for search and AI systems.
Featured Snippet: A featured snippet is a short answer Google extracts from a ranking page and displays at the top of search results, and the skill of winning one transfers directly to AI citation.
Fine-Tuning: Fine-tuning is the process of further training a pre-trained AI model on a narrower dataset to specialize its behavior, distinct from the retrieval that grounds live answers.
First Contentful Paint (FCP): First Contentful Paint is a performance metric measuring how long after navigation the browser renders the first piece of page content, used as a proxy for how quickly a page becomes useful.

G

Gemini: Gemini is Google's family of multimodal AI models and its consumer assistant, able to reason over text, images, and more, and to ground answers in retrieved sources.
Generative Engine Optimization (GEO): Generative engine optimization is the practice of optimizing content to be cited in AI-generated answers, an alternative name for AEO that emphasizes the generative engines producing the responses.
Google AI Mode: Google AI Mode is a conversational search experience that answers complex questions by breaking them into many sub-queries, retrieving across all of them, and synthesizing a single cited response.
Google AI Overviews: Google AI Overviews are AI-generated answer summaries shown at the top of Google search results, with links to cited sources, that often answer the query without a click.
Google-Extended: Google-Extended is a robots.txt control that lets you opt out of having your content used to train Google's Gemini and Vertex AI models, without affecting Google Search or AI Overviews.
GPTBot: GPTBot is OpenAI's web crawler that gathers content to train its models, identified by the GPTBot user-agent and controllable through your robots.txt file.
Grounding: Grounding is the practice of tying an AI model's answer to specific retrieved sources, so the response reflects real documents rather than the model's unverified internal memory.

H

Hallucination: A hallucination is when an AI model states something false or fabricated as if it were fact, usually because it generated from memory instead of grounding its answer in real sources.
Hybrid Search: Hybrid search combines keyword-based and meaning-based retrieval so a system can match both exact terms and semantic intent, and is the common approach in modern AI search.
Hydration: Hydration is the process where JavaScript attaches interactivity to server-rendered HTML in the browser, letting a page be both crawlable as text and interactive for users.

I

Indexability: Indexability is whether a crawled page is eligible to be stored in a search index, since a page can be crawlable yet still excluded from the index that retrieval and AI answers draw from.
Inverted Pyramid: The inverted pyramid is a writing structure, borrowed from journalism, that puts the most important information first and supporting detail after, making each passage answer-first and easy for AI to lift.

J

JSON-LD: JSON-LD is the recommended format for adding schema markup to a page, embedding structured data as a separate JSON block so machines can read your content's meaning without touching the visible layout.

K

Keyword Search: Keyword search finds content by matching the literal words in a query against the words in documents, the traditional approach now complemented by meaning-based vector search.
Knowledge Cutoff: A knowledge cutoff is the date beyond which an AI model's training data ends, so without live retrieval the model has no built-in knowledge of anything that happened after it.
Knowledge Distillation: Knowledge distillation is a technique for training a smaller, faster AI model to mimic a larger one, transferring much of its capability into a cheaper-to-run model.
Knowledge Graph: A knowledge graph is a structured network of entities and the relationships between them, which search and AI systems use to understand facts about the world and about your brand.

L

Large Language Model (LLM): A large language model is an AI system trained on vast amounts of text to predict and generate language, and is the engine that writes the answers in AI search.
Largest Contentful Paint (LCP): Largest Contentful Paint measures how long it takes for the biggest piece of content to render, a Core Web Vital that captures how quickly a page feels loaded.
llms.txt: llms.txt is a proposed standard file at your domain root that gives AI systems a curated, markdown map of your most important content, helping them find and understand your best pages.

M

Microsoft Copilot: Microsoft Copilot is Microsoft's AI assistant that answers questions using the Bing search index and cites web sources, making Bing crawlability a prerequisite for Copilot visibility.
Multimodal: Multimodal AI can understand and generate more than one type of content — text, images, audio, and video — letting engines answer questions that span formats.

N

NAP Consistency: NAP consistency means keeping your business Name, Address, and Phone number identical everywhere they appear online, a trust and identity signal that matters for local and AI visibility.

O

OAI-SearchBot: OAI-SearchBot is OpenAI's crawler that indexes pages for ChatGPT's search feature, making it the user-agent to allow if you want to be cited in ChatGPT's answers.
Organization Schema: Organization schema is structured data describing a company — its name, logo, and official profiles — that helps AI systems recognize your business as a consistent, identifiable entity.

P

Passage Retrieval: Passage retrieval is the practice of finding and returning specific relevant passages from within documents, rather than whole pages, which is why AI engines cite paragraphs instead of articles.
People Also Ask: People Also Ask is a Google feature showing related follow-up questions users search, and it is a free, direct window into the real questions your content should answer.
Perplexity: Perplexity is an AI answer engine that responds to questions with synthesized answers and prominent inline citations, making it one of the most source-transparent engines to optimize for.
PerplexityBot: PerplexityBot is Perplexity's web crawler that indexes pages so they can be retrieved and cited in Perplexity's answers, identified by the PerplexityBot user-agent.
Person Schema: Person schema is structured data describing an individual — their name, role, and authoritative profiles — that helps AI systems recognize an author or expert as a known entity.
Position Bias: Position bias is the tendency of retrieval and language models to weight content near the start of a page or passage more heavily, making where you place an answer matter as much as the answer itself.
Precision: Precision is a retrieval metric measuring what fraction of the passages a system returned were actually relevant, capturing how much of the result set is signal versus noise.
Primary Source: A primary source is original, first-hand material — your own data, research, or direct experience — that exists nowhere else, making it uniquely citable because engines can't assemble it from elsewhere.
Prompt Engineering: Prompt engineering is the practice of crafting inputs to an AI model to get better, more reliable outputs, and in AEO it underlies how you build the prompt sets used to measure visibility.
Prompt Injection: Prompt injection is an attack where hidden or malicious instructions in content trick an AI model into ignoring its real task, and attempting it as an AEO tactic risks penalties and backfires.
Prompt Set: A prompt set is the fixed list of real questions you run across AI engines to measure your visibility, the stable foundation that makes citation tracking comparable over time.

Q

Query Fan-Out: Query fan-out is when an AI engine takes one user question and silently expands it into several related searches, then synthesizes one answer from everything it retrieves across them.
Question-Shaped Heading: A question-shaped heading is a heading written as the actual question a user would ask, which aligns your content with real queries and marks exactly where the answer begins.

R

RAG (Retrieval-Augmented Generation): RAG is the technique behind most AI answer engines, where the model first retrieves relevant documents from the live web or an index and then generates an answer grounded in what it found.
Recall: Recall is a retrieval metric measuring what fraction of all the relevant passages a system actually found, capturing whether the right content was retrieved at all.
Reranking: Reranking is a second pass in retrieval where an initial set of candidate passages is reordered by a more precise relevance model, deciding which few actually make it into the AI's answer.
Retrieval: Retrieval is the step where an AI system searches an index to find the most relevant passages for a query before generating an answer, and it decides which content is even eligible to be cited.
robots.txt: robots.txt is a plain text file at the root of your domain that tells crawlers which user-agents may access which parts of your site, and is how you allow or block AI crawlers.

S

sameAs: sameAs is a schema.org property that links your entity to its authoritative profiles elsewhere, telling AI systems that all those pages refer to the same person or organization.
Schema Markup: Schema markup is structured data added to a page using schema.org vocabulary that tells machines explicitly what the content is, helping AI systems understand and trust your information.
Semantic Chunking: Semantic chunking splits content into passages along meaning boundaries rather than fixed lengths, so each chunk is a coherent, self-contained idea that retrieves and cites cleanly.
Semantic HTML: Semantic HTML is markup that uses elements according to their meaning — headings, lists, articles, tables — so machines can understand a page's structure and extract its content accurately.
SERP: SERP stands for search engine results page, the page of results returned for a query, now increasingly topped by AI Overviews and answer boxes rather than plain links.
Server-Side Rendering (SSR): Server-side rendering is when a web server generates a page's full HTML for each request and sends it ready-to-read, so content is present immediately for both browsers and AI crawlers.
Share of Model: Share of model is how often an AI model names or recommends your brand from its own internal knowledge, without live retrieval, reflecting how well-established you are in what the model learned.
Share of Voice (AI): Share of voice in AI search is the proportion of relevant AI answers in which your brand appears, measured across a fixed set of questions, as a gauge of how present you are in the conversation.
Sparse Retrieval: Sparse retrieval finds relevant content by matching actual words and their importance, using classic methods like BM25, and still complements meaning-based retrieval in modern systems.
Static Site Generation (SSG): Static site generation is when pages are pre-rendered to finished HTML files at build time, so every visitor and crawler gets fully-formed, fast-loading content with no per-request work.
System Prompt: A system prompt is the hidden instruction that sets an AI assistant's behavior and rules before it sees the user's question, shaping how it answers and what it's allowed to do.

T

Temperature: Temperature is a setting that controls how random or deterministic an AI model's output is, with higher values producing more varied responses and lower values more predictable ones.
TF-IDF: TF-IDF is a classic method for scoring how important a word is to a document, balancing how often it appears against how common it is across all documents.
Time to First Byte (TTFB): Time to First Byte measures how long after a request the server sends the first byte of the response, an early speed signal that affects both crawl efficiency and how fast a page can load.
Token: A token is the basic unit of text an AI model processes — typically a word or word-piece — and is how model limits, costs, and context windows are measured.
Tokenization: Tokenization is the process of splitting text into tokens before an AI model can process it, converting human-readable language into the units the model actually operates on.
Topic Cluster: A topic cluster is a set of interlinked pages that together cover a subject comprehensively, a content structure that builds topical authority and gives engines many ways to cite you.
Topical Authority: Topical authority is the depth and breadth of trusted coverage you have on a subject, which makes search and AI systems more likely to treat you as a go-to source for it.
Training Data: Training data is the body of text and other content an AI model learns from during training, shaping what it knows by default before any live retrieval is involved.
Transformer: The transformer is the neural-network architecture behind modern language models, which uses an attention mechanism to weigh how words relate, enabling fluent understanding and generation of text.

V

Vector Database: A vector database stores content as embeddings and is optimized to quickly find the items whose vectors are most similar to a query, powering semantic retrieval at scale.
Vector Search: Vector search is a retrieval method that finds passages by meaning rather than keywords, comparing the numeric embedding of a query against the embeddings of indexed content to surface the closest matches.

Z

Zero-Click Search: Zero-click search is when a user gets their answer directly on the results page or in an AI response without clicking through to any website, the dominant pattern AEO is a response to.

New to the framework? Start with the AEO Canon or the what is AEO overview.