← Back to Glossary

Context Window

The context window determines whether an LLM can reason across a long document or loses track after a few pages. It's the working memory of AI — finite, expensive, and more nuanced than the headline number suggests.

What is a Context Window?

A context window is the total amount of text — measured in tokens — that a language model can consider at one time when generating a response. Everything inside the context window is “visible” to the model: the system prompt, conversation history, retrieved documents, tool outputs, and the current user message. Everything outside it might as well not exist.

Tokens are the fundamental unit of language model processing. They’re roughly equivalent to word fragments — “context” might be one token, “window” might be one token, “couldn’t” might be two. Most English text runs about 750 words per 1,000 tokens, so a 200,000-token context window can hold roughly 150,000 words — about the length of a long novel.

Context window sizes have expanded dramatically in recent years. GPT-3 launched with 4,096 tokens. Modern frontier models offer 128,000 to 1,000,000+ tokens. This expansion has enabled new use cases — whole codebases in a single prompt, long research papers fully in context, extended multi-turn conversations — while also raising new questions about what models actually do with all that available information.

Why Context Window Size Matters

The context window is the practical limit on what an AI system can reason about in a single inference call. Its size matters differently depending on the use case:

  • Document analysis: Summarizing, reviewing, or extracting information from long documents requires the entire document to be in context. Short context windows force chunking strategies that can miss cross-document connections.
  • Long conversations: Every message in a conversation consumes context tokens. In long sessions, early messages get dropped from context or the conversation must be summarized, causing the model to “forget” earlier discussion.
  • Code generation: Working with large codebases benefits from having multiple related files in context simultaneously so the model can understand relationships and avoid inconsistencies.
  • Agentic workflows: Multi-step agents accumulate tool outputs, observations, and intermediate reasoning in the context. Longer workflows require larger context windows or explicit memory management strategies.

The Trade-off: Bigger Isn’t Always Better

The counterintuitive finding from AI research is that larger context windows don’t automatically produce better results. Several phenomena limit the practical benefit of very long contexts:

  • Lost in the middle: Research has consistently shown that LLMs perform best when relevant information is at the beginning or end of the context, and worse when it’s buried in the middle. A 1M-token context window doesn’t guarantee the model finds what matters in the middle of a massive document.
  • Cost scaling: Processing cost scales roughly with context length. A 200,000-token call costs significantly more than a 2,000-token call. For high-volume applications, the cost difference is material.
  • Latency: Longer contexts produce higher latency — the model takes longer to generate a response when processing more input tokens. For real-time applications, this matters.
  • Attention dilution: With very long contexts, the model’s attention is spread across more tokens, which can reduce the precision of reasoning about any specific passage.

Practical Limits in Real Applications

When building AI-powered products, the practical question isn’t “how large is the context window?” — it’s “what’s the minimum context I need for this task, and how do I structure it most effectively?”

For document-heavy applications, retrieval-augmented generation (RAG) often outperforms stuffing entire documents into a long context, because RAG surfaces the most relevant passages rather than forcing the model to find them itself. For conversational applications, thoughtful conversation summarization can preserve the most important context from earlier exchanges without consuming the full window.

For agentic applications, explicit context management — pruning spent tool outputs, compressing intermediate reasoning, and prioritizing recent and relevant information — is essential for keeping long-running agents functional within context limits.

The engineering discipline of context management is increasingly important as AI applications grow in complexity. The teams building the best AI products aren’t just picking the largest context window — they’re thinking carefully about what goes in it and in what order.

Related Terms and Concepts

Automation, SaaS, Product Development, Workflow Automation