The Best Way to Feed Long Documents to Claude (and Other Long-Context Models)
Claude's 200K-token context is generous, but you'll still want to clean, compress, and structure long documents before sending them. Here's a step-by-step playbook.
Claude 3.5 Sonnet's 200,000-token context is the kind of number that makes you want to dump entire bookshelves into a chat. Resist that urge. Even with all that room, how you prepare the input still determines how good the answer is — and how much you pay per call.
Position matters more than people think
Long-context models exhibit a "lost in the middle" effect: information placed in the very middle of a long prompt is recalled less reliably than information near the start or end. The fix is mechanical:
- Put the question first
- Order documents from most-relevant to least-relevant
- Repeat the question at the very end
That single change has been shown to improve recall on multi-document QA by double-digit percentages.
Step 1: Clean every document
Use PDF to Clean Text for PDFs and Batch Document Extractor for mixed folders. Strip headers, footers, page numbers, and broken hyphenation. A 200K-token context filled with clean prose comfortably outperforms a 200K-token context that's 30% noise.
Step 2: Compress aggressively
For very long inputs, Context Compressor can shrink prompts 30–50% without meaningful information loss by collapsing whitespace, removing low-signal phrases, and condensing repetition. Two practical wins:
- Lower per-call cost (Claude charges by tokens in)
- More context room left over for the model's response
Step 3: Use clear delimiters
Claude responds well to structured prompts. Wrap each document in XML-like tags so the model can reference them precisely:
<documents>
<document index="1">
<source>Q3-report.pdf</source>
<content>...cleaned text...</content>
</document>
<document index="2">
<source>competitor-analysis.pdf</source>
<content>...cleaned text...</content>
</document>
</documents>
Question: Which competitor grew fastest in Q3, and why?Step 4: Ask once, ask well
With a long context, you only get one shot per call (each follow-up means re-sending the whole thing). Front-load any sub-questions, requested formats, and constraints in the first message.
What about Gemini's 2M context?
Everything above applies, just more so. Gemini 1.5 Pro and 2.0 happily accept hour-long videos and 1,500-page books, but the lost-in-the-middle effect is more pronounced at extreme lengths. Cleaning and ordering matter more, not less.
When to chunk instead
If your real corpus is bigger than even a 200K window, switch to retrieval. Build a small RAG system where you embed cleaned chunks and only feed the top-matches into the prompt. Batch Document Extractor's ZIP output (one file per source) drops directly into LangChain or LlamaIndex.
Tools mentioned
Frequently asked
Does Claude really use all 200K tokens equally?
No. Like every transformer, Claude weights tokens near the start and end of its context more heavily. Putting the question at both ends and the most important document first measurably improves accuracy.
Should I split a long document into chunks?
If the document is genuinely longer than the context window, yes. Otherwise, sending it as one continuous block lets the model build a coherent mental model of the whole.
Is compression always worth it?
For documents over ~30K tokens, almost always. Below that, the time savings vanish.
Keep reading
How to Prepare PDFs for ChatGPT, Claude, and Gemini
A practical guide to extracting clean, AI-ready text from PDFs — born-digital and scanned — so ChatGPT, Claude, and Gemini answer accurately and don't waste tokens on headers, footers, and page numbers.
Cleaning Whisper Transcripts for AI Summaries
OpenAI Whisper, Otter, and YouTube transcripts are full of timestamps, filler words, and speaker noise. Here's how to strip them before sending to ChatGPT or Claude — and why it matters.
GPT-4 Vision: Best Image Size and Format for Accuracy
Bigger isn't better for AI vision. Here's the resolution and format sweet spot for GPT-4 Vision, Claude 3.5 Sonnet, and Gemini, plus why downscaling before upload often improves accuracy.