Every LLM call is priced by tokens — sub-word units a model reads. Token Counter tells you exactly how big your prompt is for GPT-4, GPT-4o, Claude, and Gemini, plus how much of each model's context window you're using. Counts run live in your browser as you type or paste.

OpenAI counts use the official tiktoken BPE encoders (cl100k for GPT-3.5/4/4-Turbo, o200k for GPT-4o). Claude and Gemini counts are calibrated estimates because those tokenizers aren't publicly published — accurate to within a few percent for English prose.

How to use it

1.
Paste your prompt
Drop in a system prompt, a chat message, a document, or any block of text you'll send to a model.
2.
Pick the target model
GPT-4o, GPT-4 Turbo, GPT-3.5, Claude Sonnet, Claude Opus, Gemini 1.5/2.0 — each uses its own tokenizer or estimator.
3.
Read the meter
See live token count, context-window percentage, characters, words, and chars-per-token ratio. Use the ratio to spot prompts that are unusually expensive.

Why models tokenize the same text differently

Each family is trained with its own tokenizer vocabulary. GPT-4o's o200k_base has a richer vocabulary than GPT-3.5's cl100k_base, so the same English paragraph costs fewer tokens on GPT-4o. Code, JSON, emoji, and non-English scripts widen the gap further — a Japanese paragraph can cost 2-3× more tokens on cl100k than on o200k.

Counting tokens before you ship saves real money

API providers bill input and output tokens separately, with output usually 2-5× more expensive. Knowing your input is 3,200 tokens (not 4,800) lets you accurately budget per-call cost, and gives you headroom for the model's reply within the context window. For agents that loop dozens of times per task, the savings compound fast.

Best for

Estimating API cost before sending a prompt
Checking whether a document fits in Claude's 200K window
Comparing prompt length across GPT-4 and Claude
Trimming long system prompts to fit a budget
Sanity-checking RAG chunks before embedding

Why token counts matter

Every LLM API charges by tokens — the sub-word units a model reads. A "token" is roughly 3-4 characters of English, but punctuation, code, and non-Latin scripts shift that ratio dramatically. Counting tokens before you send a prompt tells you three things at once: how much the call will cost, whether it will fit the context window, and how much room you have left for the model's reply.

For OpenAI models we use the official tiktoken encoders (cl100k_base for GPT-3.5/4/4-Turbo, o200k_base for GPT-4o). Anthropic doesn't publish their tokenizer, so Claude counts use cl100k plus a small empirical overhead that lands within ~3% of the real count for most English prose. Gemini uses SentencePiece; we approximate with ~4 chars/token, which is accurate to within ~10% in practice.

If a prompt is too long, head to the Context Compressor to shrink it without losing meaning, or to PDF to Markdown if your input came from a document.

Count tokens for any LLM.

How to use it

Why models tokenize the same text differently

Counting tokens before you ship saves real money

Best for

Why token counts matter

Frequently asked

How to use it

Why models tokenize the same text differently

Counting tokens before you ship saves real money

Best for

Why token counts matter

Frequently asked

Are these counts exact?

Why do tokens differ between models?

Does the count include system prompts and replies?

Is the text uploaded anywhere?

How accurate is the Gemini estimate?