What Is a Context Window?

A context window is the maximum amount of text — measured in tokens — that an AI model can consider at once, including your prompt, any retrieved sources, the conversation so far, and the answer it's writing. It is the model's short-term working memory, and anything that doesn't fit is, for that request, invisible.

The context window · a fixed token budget

Everything the model considers at once — the question, the retrieved passages, and room for the answer — shares one fixed window. Add passages or make them longer, and some stop fitting.

sys

answer

system retrieved passages answer headroom

Passages retrieved 6Tokens per passage 180

6 of 6 passages fit — all fit, with room to spare. (~2,560 of 8,000 tokens used.) Tight, self-contained passages win the limited space — the budget reason to write answer-first.

What does the context window contain?

The context window contains everything the model needs to read for a single response — and it all shares one token budget. That includes the hidden system instructions, the conversation history, your current prompt, any documents retrieved to ground the answer, and the answer being generated token by token. If the total exceeds the window, something has to give.

Because the window is measured in tokens, tokenization directly affects how much fits: verbose or unusual text consumes the budget faster. And because the model attends across everything in the window at once (see how LLMs work), what you put in it — and what you leave out — shapes the answer.

How big are context windows in 2026?

Context windows in 2026 span a wide range, from a few hundred thousand tokens to several million. Many frontier models offer roughly 200,000 to 1,000,000 tokens; some long-context models advertise far more — Meta's Llama 4 Scout, for example, markets a context window around 10 million tokens. (Capacities change often; always check a model's current lab documentation.)

Bigger isn't automatically better

A large advertised window doesn't mean the model uses every position equally well. Models often attend most reliably to the beginning and end of a long context and can overlook details buried in the middle. More room helps — but placing the right passage, not just more text, is what drives a good answer.

What happens when text exceeds the window?

When text exceeds the window, the model simply cannot see the overflow — so systems decide what to keep. They truncate, drop older turns, or summarize history to make room. The consequence is concrete: a fact outside the window has zero chance of influencing the answer, no matter how relevant it is. This is why long documents are split into passages and only the most relevant chunks are pulled in, a problem studied directly in work like "Passage Segmentation of Documents for Extractive Question Answering" (arXiv 2501.09940), which examines how to break documents into passages that answer questions well.

Does a big window remove the need for retrieval?

A big context window does not remove the need for retrieval. Even when you could paste an entire corpus into a huge window, it's usually a bad idea: it's slow, expensive (you pay per token), and it dilutes the model's attention across mostly irrelevant text. Selecting the few most relevant passages with retrieval-augmented generation typically produces more accurate answers at a fraction of the cost. Window size and retrieval are complementary, not substitutes.

Why does the context window matter for AEO?

The context window matters for AEO because your content competes for a scarce, finite space. When an engine retrieves sources to answer a question, only a handful of passages make it into the window — and a self-contained, answer-first passage earns its place more easily than a long, rambling one that would crowd out everything else. Writing tight, liftable passages isn't only about readability; it's about fitting into the budget where the answer is actually formed. That's the context-window view of extractability and a core idea in what is AEO.

Next: what is RAG for how passages are chosen to fill the window, or what is grounding for what they're used to do.

Frequently asked questions

What is a context window in an LLM?

The context window is the maximum number of tokens a model can process in a single request — its short-term working memory. It must hold everything at once — the system instructions, your prompt, any retrieved documents, the conversation history, and the answer being generated. Anything beyond the window is invisible to the model.

How big are context windows in 2026?

They range widely. Many frontier models offer windows around 200,000 tokens to 1 million tokens, and some long-context models (such as Llama 4 Scout) advertise up to about 10 million tokens. Bigger windows let a model consider more text at once, but don't guarantee it uses every part equally well.

What happens when you exceed the context window?

The model can't see the overflow. Systems handle it by truncating, dropping older messages, or summarizing — which means details outside the window are simply ignored. This is why long documents are split into passages and only the most relevant chunks are retrieved into the window.

Does a bigger context window replace retrieval?

No. Even with a large window, feeding an entire corpus every time is slow, costly, and dilutes attention. Retrieval (RAG) selects the few most relevant passages to place in the window, which is usually more accurate and far cheaper than relying on size alone.

You Use AI Every Day. Is AI Recommending Your Business?

Using AI to run your business and being recommended by AI to customers are two different games. You've likely won the first — ChatGPT drafts your emails and quotes — while quietly losing the second, where customers ask AI who to hire and it names a competitor.

4 min read

AI & LLM Fundamentals

AI for Print & Sign Shops: The Tools You Use vs the Customers You're Missing

Your shop already runs on AI — it mocks up designs, writes quotes, and proofs artwork. But when a business owner asks AI where to get banners, signs, or business cards nearby, it names one or two shops. Being the one it names is a different discipline called AEO.

3 min read

AI & LLM Fundamentals

The AI Tools Small Businesses Actually Use in 2026 — and the Gap They All Share

A practical roundup of the AI tool categories small businesses really use — writing, customer service, scheduling, bookkeeping, design, and reviews — plus the one thing none of them do, which is make AI recommend you to a new customer.

4 min read

What does the context window contain?

How big are context windows in 2026?

What happens when text exceeds the window?

Does a big window remove the need for retrieval?

Why does the context window matter for AEO?

Frequently asked questions

Related reading

You Use AI Every Day. Is AI Recommending Your Business?

AI for Print & Sign Shops: The Tools You Use vs the Customers You're Missing

The AI Tools Small Businesses Actually Use in 2026 — and the Gap They All Share