Models·May 6, 2026·7 min read

ChatGPT Token Limit Explained (2026)

Every ChatGPT model's token limit in plain English — GPT-4o, GPT-4 Turbo, GPT-3.5 — and what to do when you hit it. Includes a free token counter and concrete examples.

"Token limit" is the number one source of mysterious ChatGPT behavior — pastes that get cut off, conversations that lose track of what you said, summaries that stop mid-sentence. Once you understand the numbers and how to measure them, the mystery evaporates.

The 2026 limits

GPT-4o: 128,000 token context, 16,384 token max output
GPT-4 Turbo: 128,000 token context, 4,096 token max output
GPT-4o mini: 128,000 token context, 16,384 token max output
GPT-3.5 Turbo: 16,385 token context, 4,096 token max output

These are API limits. The free ChatGPT web app applies tighter effective limits (especially during peak load) and rate limits on the most capable models.

What does 128K tokens actually fit?

~250–300 pages of clean book prose
~96,000 English words
~50,000 lines of typical code
About 14 hours of speech transcript

Want a precise count for your prompt? Token Counter uses the official tiktoken encoder and shows live counts for GPT-4o, Claude, and Gemini.

Why your paste got truncated

ChatGPT doesn't warn you when you exceed the limit — it silently drops the overflow from the start of the input. Your question (usually at the bottom) survives. The opening sections of your document get amputated.

That's why long-document summaries sometimes "miss" the introduction, and why the model claims a section doesn't exist when it clearly does.

The fix: clean first, count second, send third

Clean the input. PDFs are usually 10–15% noise (headers, footers, page numbers). Use PDF to Clean Text to strip it.
Count the tokens. Paste the cleaned text into Token Counter. Confirm you're under the model's window.
If you're still over, compress. Context Compressor shrinks prompts 30–60% with minimal information loss.

Output truncation is a separate problem

Even with a 128K input window, GPT-4 Turbo will only generate up to 4K output tokens per response. If you ask for a long writeup, it stops mid-sentence. Either chunk the request ("give me sections 1–3 first, then 4–6") or switch to GPT-4o, which raises the output cap to 16K.

Multi-turn conversations

Every turn carries the full prior context. A long conversation about a 50K-token document hits the cap surprisingly fast — by turn 3 or 4 you've doubled the original token count. Strategies that work: summarize older turns into a single recap, or start a fresh chat with the (smaller) summary as the new baseline.

Quick reference

For deeper coverage of the other major models' limits, see the companion guides on Claude's 200K context window and Gemini's 2M-token window. The same principles apply — clean, count, compress — but the absolute numbers and pricing shift the trade-offs.

Tools mentioned

Frequently asked

Is ChatGPT's limit on input or input + output?

It's a combined budget. GPT-4o's 128K covers your prompt + Claude's response together. The output is also capped separately at 16K tokens regardless.

Why did ChatGPT cut off my long paste?

You exceeded the model's context window. Free ChatGPT often runs GPT-4o-mini with smaller effective limits, and the UI silently truncates to fit.

How do I count tokens before sending?

Use a tokenizer that supports the model's encoding (cl100k_base for GPT-4-class). The Token Counter tool linked below uses the official OpenAI tokenizer locally.

Keep reading

Workflows·Apr 12, 2026·6 min read