Techniques·May 1, 2026·6 min read

Token Optimization: Cut Your AI API Costs in Half

Practical, browser-side techniques to reduce input tokens without losing meaning. Compress prompts, clean documents, and prune system messages — the savings compound across every call.

Token costs scale linearly with usage, so a 40% reduction on your prompts is a 40% reduction on your bill — every day, forever. The same compression also leaves more context room for your model to actually answer, and reduces latency. Here's the playbook.

Audit the system prompt first

The system prompt is sent on every single call. A 1,200-token system prompt that runs a million times a month is 1.2 billion tokens. Things to cut:

Repeated rules said three different ways
Examples that never trigger
Polite framing ("Please be careful to…" → "Be careful to…")
Verbose role descriptions when one sentence will do

Compress user-supplied content

Use Context Compressor on long pasted text. The "Light" setting is safe for any input — it removes whitespace and obvious redundancy without changing meaning. "Aggressive" goes further; spot-check the output before relying on it.

Pre-process documents

For PDFs, the single biggest token saver is removing repeating headers and footers. PDF to Clean Text does this automatically and typically cuts 10–15% on long documents — more on academic PDFs with running headers.

Strip transcripts

Auto-generated transcripts are 30–50% noise. Transcript Cleaner strips timestamps, filler words, and duplicate speaker labels with toggles for each. A 60-minute meeting commonly drops from 14,000 to 8,500 tokens.

Prefer Markdown to HTML

For copied web content, Markdown is roughly half the tokens of equivalent HTML and structurally closer to the model's training data. Convert with the markdown utilities before pasting.

Write tighter prompts

Use "Output JSON with keys: title, body" instead of three sentences describing it
Number your requirements (1, 2, 3) — this often replaces "Also remember to…" elsewhere in the prompt
Drop "Thank you" and "Please". The model isn't offended.

Cache what you can

Both Claude and OpenAI now offer prompt caching. If you send the same long context (a knowledge base, a system prompt) to many calls, caching can drop the price of that prefix by 90%. Combine caching with cleaned, compressed input for the biggest wins.

Measure, don't guess

Pick a representative call and tokenize the prompt before and after. If you're not measuring, you can't tell whether a "shorter" rewrite actually reduced tokens or just felt shorter.

Tools mentioned

Frequently asked

Does compression hurt response quality?

Light compression (whitespace, redundancy) doesn't. Aggressive compression can. The sweet spot is a 30–40% reduction.

Should I optimize the system prompt?

Yes — it's sent on every call, so savings compound the most. Audit it line by line.

Is there a tokenizer for browsers?

Yes. Most cost calculators use a roughly 4-characters-per-token approximation, which is accurate within ±10% for English prose.

Keep reading

Workflows·Apr 12, 2026·6 min read