Context Window
The context window is the maximum amount of text an AI model can consider at once, including the question, any retrieved sources, and its own answer, measured in tokens.
The context window is how much an AI model can read at one time. Measured in tokens, it holds everything the model is working with for a single response — the user's question, the retrieved passages, and the answer being generated. Anything outside the window simply isn't available to the model.
This is why answer engines retrieve passages, not whole sites: only so much content fits, so the system selects the few most relevant chunks to place in the window. Being concise and extractable helps you earn one of those limited slots — a tight, complete answer uses the budget efficiently, while a rambling page is less likely to be included or, if it is, may be truncated mid-thought.
Example. When RAG feeds an engine the top five passages for a query, those five plus the question must fit the context window. A crisp 150-word answer is a better candidate for that space than a meandering 600-word section.
Relevant pillar
Related terms
- TokenA token is the basic unit of text an AI model processes — typically a word or word-piece — and is how model limits, costs, and context windows are measured.
- Passage RetrievalPassage retrieval is the practice of finding and returning specific relevant passages from within documents, rather than whole pages, which is why AI engines cite paragraphs instead of articles.
- RAG (Retrieval-Augmented Generation)RAG is the technique behind most AI answer engines, where the model first retrieves relevant documents from the live web or an index and then generates an answer grounded in what it found.