Skip to content
AEO Canon · the reference for answer-engine optimization

AI & LLM Fundamentals

How AI language models actually work — training, tokens, embeddings, context windows, RAG, and limits like hallucination and knowledge cutoffs.

23 articles · in 2 courses

Articles in AI & LLM Fundamentals

AI-detection tools are unreliable and not what answer engines use to decide citations — engines judge content on quality, originality, and accuracy, not on whether a machine wrote it. Stop chasing detection and make content genuinely original, because generic content fails no matter who wrote it.

2 min read

AI-generated content can get cited, but only when it's made genuinely original, accurate, and useful — raw model output tends to be generic, unsourced, and interchangeable, which is exactly what engines skip. The deciding factor is the substance and originality you add, not whether a model helped write it.

2 min read

It depends on the engine — web-grounded engines like Perplexity and Google AI can surface new content within days once it's crawled, while a model's built-in training knowledge lags months behind its cutoff. So fresh content reaches retrieval-based answers quickly but base-model knowledge slowly.

2 min read

A model's knowledge cutoff means its built-in training data stops at a fixed date, so it won't natively know anything published after it — which is why recent content reaches you only through engines that retrieve the live web. Freshness in AI search runs through retrieval, not the model's frozen memory.

2 min read

AI & LLM Fundamentals

Why Do AI Models Hallucinate?

AI models hallucinate — state false things confidently — because they generate the most plausible text, not verified truth. When training patterns run thin, they fill the gap with fluent fabrication. Grounding in real sources is the main fix.

2 min read

Training data is the text an AI model learns from — typically trillions of tokens drawn from the public web, books, code, and licensed sources. Its breadth, quality, and recency shape everything the model knows.

2 min read

AI & LLM Fundamentals

What Is Tokenization in AI?

Tokenization is how an AI model breaks text into tokens — words or word-pieces — that it can process numerically. Tokens are the unit LLMs read, predict, and bill by, and they shape cost, limits, and clarity.

2 min read

Retrieval-augmented generation (RAG) works by retrieving relevant passages from an external source, then having a language model generate an answer grounded in them. It is the architecture behind every AI answer engine.

7 min read

AI & LLM Fundamentals

What Is Grounding in AI?

Grounding is connecting an AI model's answer to real, retrieved source material so its claims are supported by evidence it can cite — rather than generated from memory alone. It's how AI answers earn trust.

2 min read

A large language model (LLM) is an AI system trained on vast amounts of text to predict the next token, which lets it generate fluent language, answer questions, and power tools like ChatGPT, Claude, and Gemini.

3 min read

AI & LLM Fundamentals

What Is a Reranker?

A reranker is the model that re-scores retrieved passages for a specific query, weighing relevance, authority, and freshness to pick the few an AI engine actually uses. It is where citations are won or lost.

2 min read

AI & LLM Fundamentals

What Is a Knowledge Cutoff?

A knowledge cutoff is the date after which an AI model's built-in training knowledge stops. The model knows nothing that happened later unless it retrieves live sources — which is why search augmentation and freshness matter.

3 min read

AI & LLM Fundamentals

What Is a Context Window?

A context window is the maximum amount of text — measured in tokens — that an AI model can consider at once, including your prompt, any retrieved sources, and its own answer. It bounds what the model can "see."

2 min read

AI & LLM Fundamentals

What Are Embeddings in AI?

Embeddings are numeric vectors that represent the meaning of text, so an AI can compare ideas by mathematical similarity rather than exact words. They are how semantic search and retrieval find the right passage.

2 min read

Citation is text-first today — engines quote transcripts, captions, and alt text — but multimodal models that read video, audio, and images directly are emerging. The durable strategy is to win the text layer now (it loses nothing later) while making your content genuinely strong across formats.

3 min read

A maintained guide to the major large language models of 2026 — the labs behind ChatGPT, Claude, Gemini, Llama, and Grok, their flagship models, and what sets each apart. Reviewed quarterly.

3 min read

LLMs work by breaking text into tokens, converting them to embeddings, using a transformer's attention mechanism to weigh context, and predicting the next token one at a time — repeated to generate full answers.

3 min read

To recommend a product, AI interprets the buyer's need, retrieves candidates from the review-and-comparison sources it trusts, and picks the ones best matched and best reviewed. It reasons over reputation and fit, not your marketing — so reviews, comparisons, and clear product information decide who gets recommended.

3 min read

AI & LLM Fundamentals

How Does AI Recognize Entities?

AI recognizes entities by linking the names it reads to unique items in a knowledge graph, using surrounding context and embeddings to disambiguate, then drawing on each entity's attributes and corroboration to judge trust. Recognition, disambiguation, and trust are three distinct steps you can influence.

3 min read

AI & LLM Fundamentals

How AI Reads Video Transcripts

AI answer engines mostly don't watch video — they read its transcript and metadata as text, then retrieve and cite passages the same way they cite an article. So an accurate, well-structured transcript is what makes a video extractable. Captions, titles, and descriptions complete the text layer engines actually read.

3 min read

AI & LLM Fundamentals

How Are AI Models Trained?

AI models are trained in stages — large-scale pretraining on text to learn language, then fine-tuning and reinforcement learning from human feedback (RLHF) to make them helpful, honest, and safe to use.

3 min read

A base model answers only from its frozen training; a search-augmented model retrieves live sources at query time and can cite them. The difference decides whether AI answers are current, verifiable — and whether they can cite you.

2 min read

AI can accelerate AEO content — research, outlines, first drafts, reformatting — but unedited generic output is the opposite of what earns citations. Use AI as a drafting accelerant inside a system that forces human originality and QC, never as a replacement for them.

3 min read

Courses that use this topic

All courses →

← All topics