Every LLM call is priced by tokens — sub-word units a model reads. Token Counter tells you exactly how big your prompt is for GPT-4, GPT-4o, Claude, and Gemini, plus how much of each model's context window you're using. Counts run live in your browser as you type or paste.
OpenAI counts use the official tiktoken BPE encoders (cl100k for GPT-3.5/4/4-Turbo, o200k for GPT-4o). Claude and Gemini counts are calibrated estimates because those tokenizers aren't publicly published — accurate to within a few percent for English prose.
How to use it
- 1.
Paste your prompt
Drop in a system prompt, a chat message, a document, or any block of text you'll send to a model.
- 2.
Pick the target model
GPT-4o, GPT-4 Turbo, GPT-3.5, Claude Sonnet, Claude Opus, Gemini 1.5/2.0 — each uses its own tokenizer or estimator.
- 3.
Read the meter
See live token count, context-window percentage, characters, words, and chars-per-token ratio. Use the ratio to spot prompts that are unusually expensive.
Why models tokenize the same text differently
Each family is trained with its own tokenizer vocabulary. GPT-4o's o200k_base has a richer vocabulary than GPT-3.5's cl100k_base, so the same English paragraph costs fewer tokens on GPT-4o. Code, JSON, emoji, and non-English scripts widen the gap further — a Japanese paragraph can cost 2-3× more tokens on cl100k than on o200k.
Counting tokens before you ship saves real money
API providers bill input and output tokens separately, with output usually 2-5× more expensive. Knowing your input is 3,200 tokens (not 4,800) lets you accurately budget per-call cost, and gives you headroom for the model's reply within the context window. For agents that loop dozens of times per task, the savings compound fast.
Best for
- Estimating API cost before sending a prompt
- Checking whether a document fits in Claude's 200K window
- Comparing prompt length across GPT-4 and Claude
- Trimming long system prompts to fit a budget
- Sanity-checking RAG chunks before embedding