Models··11 min read

The Complete Guide to Claude's Context Window

How Claude's 200K-token context window actually works — what fits, where recall starts to slip, how to count tokens accurately, and the prompt patterns that make long documents work in Claude 3.5 Sonnet, Opus, and Haiku.

Claude's 200K-token context window is one of the largest in production use. It's enough to hold a full novel, a quarterly earnings report with appendices, or an entire codebase for a small library. But "holds" and "reasons over accurately" are two different things — and the difference is where most prompts go wrong.

This guide walks through what 200K tokens actually means, how Claude's recall behaves at different fill levels, and the prompt patterns that consistently produce sharp answers from long documents.

What 200K tokens looks like

Claude's tokenizer averages roughly 3.5 characters per token for English (slightly denser than GPT's ~4). That gives:

  • ~150,000 English words
  • ~500 pages of clean book prose
  • ~80,000 lines of typical code
  • ~22 hours of speech transcript at 150 WPM

Want exact numbers for your text? Paste it into the Token Counter — it shows live counts for Claude, ChatGPT, and Gemini side by side.

The lost-in-the-middle problem

Anthropic's own evaluations and independent research both show the same pattern: information at the start and end of a long context is recalled near-perfectly. Information at the 30–70% mark is recalled less reliably — the famous "lost in the middle" effect.

In practice this means: if you put a critical fact at token 100,000 of a 200K prompt and ask about it, Claude may miss it. The same fact placed in the first or last 10K is essentially never missed.

The prompt pattern that actually works

  1. Lead with the question or task — what do you want Claude to do?
  2. Add the document(s) in the middle, under a clear delimiter (e.g. <document>...</document>).
  3. Repeat the question after the document.
  4. If you have multiple documents, label each one with a short ID Claude can cite.

Bracketing the context with the question doubles your hit rate on long-document Q&A — it's the single biggest improvement most people are missing.

Cleaning input: where most token budget gets wasted

Raw PDF extractions carry headers, footers, page numbers, and broken hyphenated words. On a 200-page document that's typically 8–12% of your tokens — pure noise that costs money and dilutes Claude's attention.

Run PDFs through PDF to Clean Text first. For a structured Markdown version (better for Projects), use PDF to Markdown. The savings compound: less noise → better recall → fewer follow-up clarification questions.

When you're still over budget

If even cleaned input overflows, compress before chatting. Context Compressor uses extractive and abstractive strategies to shrink prompts 30–60% while keeping the meaning. Combined with cleaning, it's common to fit a 600-page handbook into Claude's 200K window with room to spare for follow-up turns.

Multi-turn conversations eat the window too

Each turn carries the full prior context. A long conversation about a 100K-token document quickly hits the cap. Two strategies help: summarize older turns into a single "recap" message, or restart the conversation with a fresh paste of the (smaller) summary plus your new question.

Picking the right Claude model

  • Claude 3.5 Sonnet — best general balance, 200K window, fast and cheap.
  • Claude 3 Opus — strongest reasoning at long context; slower, pricier.
  • Claude 3 Haiku — cheapest and fastest; great for high-volume cleanup tasks.

For long-document Q&A, Sonnet is almost always the right call. Opus shines when reasoning depth matters more than throughput.

Quick checklist before you hit send

  • Did you clean the input (no PDF noise, no HTML cruft)?
  • Is your token count under ~140K (70% of the window)?
  • Did you put the question both before AND after the document?
  • Are documents labelled with citable IDs?
  • Did you remove prior turns that are no longer relevant?

Tick those five and Claude's 200K window goes from "big number" to a tool you can actually rely on.

Related reading

Tools mentioned

Frequently asked

How many pages is 200K tokens really?

About 500 pages of typical book prose, or 150,000 English words. Dense reports with tables and code run shorter — closer to 350–400 pages.

Does Claude charge for unused context?

No. You only pay for the tokens you actually send and the tokens generated. A 200K window costs nothing if your prompt is 5K.

Is the 200K window the same on the API and Claude.ai?

Yes for the API. Claude.ai's free tier limits effective context per conversation; Pro and Team get the full 200K.

What about Claude's 1M-token window?

Anthropic has shown 1M-token capabilities for select customers, but as of writing the public ceiling for all tiers is 200K.

Keep reading