Knowledge Distillation
Knowledge distillation is a technique for training a smaller, faster AI model to mimic a larger one, transferring much of its capability into a cheaper-to-run model.
Knowledge distillation shrinks a big model into a smaller, faster one. A compact "student" model is trained to reproduce the behavior of a larger "teacher," capturing much of its capability at a fraction of the size and cost — which is how many of the efficient models that power consumer assistants are built.
For AEO it's useful context for why so many engines behave similarly: many smaller models inherit patterns from a few large ones. It also reinforces a strategic point — because models increasingly learn from each other and from the open web, genuinely original content that exists nowhere else is what gives engines something they can't get by distilling existing knowledge. Distillation copies what's already known; it can't manufacture your first-hand data.
Example. A lightweight assistant on a phone may be a distilled version of a much larger model. It reflects the teacher's general knowledge — but for anything novel and specific, it still depends on retrieving an original source like yours.
Relevant pillar
Related terms
- Fine-TuningFine-tuning is the process of further training a pre-trained AI model on a narrower dataset to specialize its behavior, distinct from the retrieval that grounds live answers.
- Training DataTraining data is the body of text and other content an AI model learns from during training, shaping what it knows by default before any live retrieval is involved.
- Large Language Model (LLM)A large language model is an AI system trained on vast amounts of text to predict and generate language, and is the engine that writes the answers in AI search.