Gemini 2.5 Flash: token counter & pricing
Google · exact (uses official tokenizer) · pricing as of 2026-04-26.
- Provider
- API model ID
gemini-2.5-flash- Context window
- 1,000,000 tokens
- Input price
- $0.07 per 1M tokens
- Output price
- $0.30 per 1M tokens
- Tokenizer accuracy
- exact (uses official tokenizer)
- Pricing as of
- 2026-04-26
Open the counter to count tokens for Gemini 2.5 Flash in real time.
What is Gemini 2.5 Flash?
Gemini 2.5 Flash is Google's small-and-cheap general-purpose model — the cheapest exact-tokenizer model in this counter at $0.075 per million input tokens. 1-million-token context window. Multimodal in (text, image, video, audio).
How tokens are counted here
Gemini Flash uses Google's official models.countTokens endpoint via our serverless proxy. Counts are exact.
Why Gemini Flash is winning high-volume workloads
The combination is unique:
- Cheapest in class. ~½ the price of GPT-4o mini on input. ½ on output.
- 1M-token context. Most competitors top out at 128k-200k.
- Strong multimodal. Image understanding is competitive with frontier models at fraction of the cost.
- Fast. Comparable latency to GPT-4o mini and Haiku.
If your workload is *anything* you'd consider running on GPT-4o mini or Claude Haiku, Gemini Flash should be in the eval. The price gap is large enough that switching pays for the migration effort within weeks at any non-trivial volume.
When NOT to use Flash
- Frontier reasoning tasks. Flash is small. Multi-step agentic workflows often want Pro or Claude Sonnet.
- Workloads requiring strict tool-use determinism. Function-calling reliability is improving but trails OpenAI's structured outputs.
- Anywhere you've validated Claude or GPT and don't want to re-eval.
Pricing notes
The $0.075 input rate applies for prompts up to 200k tokens. Above that, Google charges a higher per-token rate — verify on Google's pricing page if you regularly send very long contexts. This calculator assumes the standard tier.
Common questions
How does Flash compare to Gemini 2.5 Pro?
Pro is $1.25/$10 input/output (vs Flash's $0.075/$0.30). Pro costs ~17× more per input token. Use Pro when reasoning quality measurably matters; Flash for everything else.
Is the count_tokens endpoint free?
Yes. Google's countTokens is separate from generation billing. Our proxy adds a 30-day cache so we don't burn quota on repeated identical inputs.
How does Flash's tokenizer compare to GPT or Claude?
Gemini tends to produce slightly fewer tokens than GPT-4o or Claude for the same English text — usually a few percent. The difference can be larger for code or non-English content. The calculator above shows the actual count for your input.
Compare Gemini 2.5 Flash to other models
- Gemini 2.5 Pro (Google, $1.25/$10.00)
- GPT-4o mini (OpenAI, $0.15/$0.60)
- Llama 3.1 8B (Meta, $0.18/$0.18)
- DeepSeek V3 (DeepSeek, $0.27/$1.10)