#tHow Many Tokens?

← Back to counter

How much does prompt caching save?

The short answer

Prompt caching cuts the per-token cost of repeated input prompts by 75% to 90% depending on the provider. If your application sends the same long system prompt repeatedly (most agents, most RAG pipelines), caching is the single biggest cost lever you have.

Cached-input rates as of April 2026

ProviderStandard inputCached inputDiscount
OpenAI GPT-5 family$0.05–$5/M$0.005–$0.50/M90% off
OpenAI GPT-4.1$2/M$0.50/M75% off
OpenAI o3 / o4-mini$1.10–$2/M$0.275–$0.55/M50–75% off
Anthropic Claude Opus 4.8$5/M$0.50/M90% off
Anthropic Claude Sonnet 4.6$3/M$0.30/M90% off
Anthropic Claude Haiku 4.5$1/M$0.10/M90% off
Google Gemini 3.1 Pro Preview$2/M (≤200k)$0.20/M90% off
Google Gemini 2.5 Pro$1.25/M$0.125/M90% off
DeepSeek V3$0.27/M~$0.027/M~90% off

OpenAI Pro tiers (5.5 Pro, 5.4 Pro, 5.2 Pro, o3-pro) don't qualify for caching as of April 2026. Plan accordingly if you're considering Pro for high-volume workloads.

When caching actually pays back

Caching applies to identical input prefixes, the prompt's first N tokens must be byte-identical across calls. Practical scenarios:

Scenarios where caching doesn't help:

Cost example

A typical agent: 4,000-token stable system prompt + 200-token user message + 100-token reply, called 1 million times per month on Claude Sonnet 4.6.

Without caching:

With caching (4,000-token system prompt cached, 200-token user message uncached):

Savings: $10,800/month, 77% reduction.

Caveats

Get a real estimate for your workload

Paste your prompt into the counter to see the token count, then multiply input tokens by the cached-rate column above to estimate your post-caching cost. The actual savings depend on your cache hit rate, which you measure in production.

Try this on every model

Try the live counter →