#tHow Many Tokens?

News and pricing updates

Auto-generated from each provider's pricing page. Pricing changes, model launches, and tokenizer-accuracy upgrades, in reverse chronological order. The leaderboard at the top updates with every build.

Live ranking

AI cost leaderboard, ranked cheapest first

Cost to run a 1,000-token-input / 200-token-output prompt 1,000 times, across the eight cheapest non-deprecated models we track. Updates automatically with every pricing snapshot.

#ModelPer 1M calls
#1 GPT-5 Nano $130.00
#2 GPT-4.1 Nano $180.00
#3 Gemini 2.5 Flash-Lite $180.00
#4 Llama 3.1 8B $216.00
#5 GPT-4o mini $270.00
#6 GPT-5.4 Nano $450.00
#7 DeepSeek V3 $490.00
#8 Gemini 3.1 Flash-Lite $550.00

full pricing changelog · try your own prompt

Update

claude-opus: updated

Anthropic released Claude Opus 4.8 on 2026-05-28. Standard mode pricing unchanged at $5 input / $25 output per 1M tokens. The claude-opus entry now resolves to 4.8 (apiId: claude-opus-4-8). Inherits the Opus 4.7 tokenizer behavior (up to 35% more tokens than legacy Claude models for the same text). Anthropic describes 4.8 as 'a modest but tangible improvement' with gains in agentic coding, reasoning, knowledge work, and honesty.

source

Models added

Added: claude-opus-fast

Added Claude Opus 4.8 Fast Mode as a separate entry: $10 input / $50 output per 1M tokens, producing tokens at ~2.5x normal speed. Anthropic dropped the fast-tier price 3x vs Opus 4.7 (was $30/$150). Useful for latency-sensitive workloads where Opus quality is needed but the standard tier's throughput is the bottleneck.

source

Accuracy upgrade

llama-family tokenizer: approx-3pct → exact

Shipped real BPE tokenization for the Llama family via llama-tokenizer-js (lazy-loaded ~2MB chunk on first Llama count). All llama-* models now labeled 'exact' instead of '≈±3%'. Mistral / Qwen / DeepSeek / GLM still use heuristic (now character-class-aware — buckets text into ASCII / digit / CJK / whitespace and applies per-class ratios — more accurate than the prior constant-ratio version, still labeled ≈±3%).

source

Models added

Added: oss

Added 5 OSS models: Llama 3.3 70B ($0.88/$0.88, current Together flagship Meta), DeepSeek V3.1 ($0.60/$1.70 Together listing), DeepSeek R1 ($3/$7 reasoning), Qwen3 Coder 480B ($2/$2 current Alibaba coding flagship), GLM-5.1 ($1.40/$4.40 new Zhipu provider). Llama 3.1 entries kept but flagged 'no longer on Together's main page — verify provider'. Llama 4 NOT added — not on Together's current pricing page.

source

Models added

Added: openai

Added 21 OpenAI models: full GPT-5 family (5, 5.1, 5.2, 5.2 Pro, 5.3, 5.4, 5.4 Mini, 5.4 Nano, 5.4 Pro, 5.5, 5.5 Pro, mini, nano, Pro tiers), GPT-4.1 family (4.1, mini, nano), and o-series reasoning (o3, o3-mini, o3-pro, o4-mini).

source

Models added

Added: google

Added Gemini 3 family: 3.1 Pro Preview ($2/$12), 3 Flash Preview ($0.50/$3), 3.1 Flash-Lite Preview ($0.25/$1.50). Added Gemini 2.5 Flash-Lite GA ($0.10/$0.40).

source

Price change

gemini-2-5-flash input price 0.075 → 0.3

Correction — Gemini 2.5 Flash priced higher than initial entry. Live Google pricing page lists $0.30 input / $2.50 output.

source

Price change

gemini-2-5-flash output price 0.3 → 2.5

Correction — see input change above.

source

Price change

claude-opus input price 15 → 5

Correction — initial Opus 4.7 entry was based on prior-generation Opus pricing. Anthropic prices Opus 4.7 at $5 input / $25 output.

source

Price change

claude-opus output price 75 → 25

Correction — see input change above.

source

Price change

claude-haiku input price 0.8 → 1

Correction — Haiku 4.5 priced higher than initial entry suggested.

source

Price change

claude-haiku output price 4 → 5

Correction — see input change above.

source

Update

all: initial

Initial pricing snapshot — 15 models across 7 providers.

source