Pricing changelog

Every change to AI model pricing tracked by this site, with the source link. Newest first.

Date	Model	Field	From	To	Source
2026-07-26	qwen-3-5-397b	added	—	—	https://www.together.ai/models/qwen3-5-397b-a17b
Added Qwen3.5 397B at $0.60 input / $3.60 output per 1M tokens (cached input $0.35) with a 262,144-token native context window. This clears one of the models deferred earlier today: the price was confirmed on Together's pricing page, and the context window and exact API id (Qwen/Qwen3.5-397B-A17B) have now been sourced from Together's own model page, so no figure is being guessed. A 397B-parameter MoE with 17B active, it leads the Qwen3.5 generation that has displaced Qwen3 Coder 480B and Qwen 2.5 72B on Together's main pricing table. Note that it breaks the older Qwen convention of charging the same rate in both directions — output is 6x input here.
2026-07-26	gemini-new-models	deferred	—	—	https://ai.google.dev/gemini-api/docs/models
STILL DEFERRED — no change applied. Re-checked the Gemini 3.6 Flash ($1.50/$7.50) and Gemini 3.5 Flash ($1.50/$9.00) entries held back earlier today. Prices re-confirmed on the pricing page, but the context windows are still unpublished: the Gemini models overview lists both as Stable without token limits, and the Vertex AI model reference does not state them either. Held back for a second run rather than shipping a guessed context window that would feed the /longest-context-window/ ranking. Separately, Gemini 3 Flash — which did not appear in this run's pricing fetch — is confirmed still available on the models overview under Preview status, not in the deprecated section, so its entry stays active and unchanged at $0.50/$3.00.
2026-07-26	claude-opus	updated	claude-opus-4-8	claude-opus-5	https://platform.claude.com/docs/en/about-claude/pricing
Anthropic shipped Claude Opus 5. The claude-opus entry now resolves to Opus 5 (apiId: claude-opus-5), following the same convention used for the 4.7 to 4.8 repoint. Standard-mode pricing is unchanged at $5 input / $25 output per 1M tokens — Anthropic has now held the Opus rate flat across 4.5, 4.6, 4.7, 4.8 and 5. Opus 4.8 and 4.7 remain available at the same rate.
2026-07-26	claude-opus-fast	updated	claude-opus-4-8-fast	claude-opus-5-fast	https://platform.claude.com/docs/en/about-claude/pricing
Fast mode (still a research preview) now covers Claude Opus 5 as well as Opus 4.8, at an unchanged $10 input / $50 output per 1M tokens. Fast-mode pricing applies across the full context window including requests over 200k input tokens. Claude API first-party only — not on Bedrock, Google Cloud, or the Batch API, and not available on Opus 4.7 or 4.6.
2026-07-26	claude-sonnet	input	3	2	https://platform.claude.com/docs/en/about-claude/pricing
The claude-sonnet entry now resolves to Claude Sonnet 5 (apiId: claude-sonnet-5). Sonnet 5 is on INTRODUCTORY pricing of $2 input / $10 output through 2026-08-31; standard pricing of $3 / $15 takes effect 2026-09-01. This is a scheduled, dated increase published by Anthropic, not a promotion of unknown duration — anyone budgeting past August should plan on $3 / $15. Sonnet 4.6 remains available at $3 / $15.
2026-07-26	claude-sonnet	output	15	10	https://platform.claude.com/docs/en/about-claude/pricing
See the input change above — Sonnet 5 introductory pricing, reverting to $15 output on 2026-09-01.
2026-07-26	claude-family	contextWindow	200000	1000000	https://platform.claude.com/docs/en/about-claude/pricing
Corrected the context window for claude-opus, claude-opus-fast and claude-sonnet from 200K to 1M. Anthropic's pricing docs now state that Claude 4.6 and later models include the full 1M-token context window at standard pricing, with no long-context surcharge — a 900k-token request bills at the same per-token rate as a 9k one. Prompt caching and batch discounts apply at standard rates across the whole window. Claude Haiku 4.5 is below the 4.6 cutoff and stays at 200K.
2026-07-26	claude-fable-5	added	—	—	https://platform.claude.com/docs/en/about-claude/pricing
Added Claude Fable 5 at $10 input / $50 output per 1M tokens with a 1M-token context window — a new tier priced above Opus. Note that Fable 5 uses the newer Claude 4.7+ tokenizer, which produces roughly 30% more tokens for the same text than Sonnet 4.6 and earlier, so the real cost gap versus older Claude models is wider than the headline rate. Claude Mythos 5 is priced identically but was NOT added: it is limited-availability (via anthropic.com/glasswing) rather than generally purchasable.
2026-07-26	openai-5-6-batch	added	—	—	https://developers.openai.com/api/docs/pricing
Added the GPT-5.6 line: Sol ($5/$30, cached $0.50), Terra ($2.50/$15, cached $0.25) and Luna ($1/$6, cached $0.10). All three carry a 1.05M-token context window, up from 400K across the GPT-5.0 to 5.5 generations. Sol and Terra match the headline rates of GPT-5.5 and GPT-5.4 respectively while offering ~2.6x the context, so there is little reason to stay on the older models at the same price.
2026-07-26	llama-3-3-70b	input	0.88	1.04	https://www.together.ai/pricing
Together.ai raised Llama 3.3 70B from $0.88 to $1.04 per 1M tokens, an 18% increase. Within the plausible-swing threshold and confirmed on the live pricing page, so applied.
2026-07-26	llama-3-3-70b	output	0.88	1.04	https://www.together.ai/pricing
See the input change above — Together prices Llama 3.3 70B identically in both directions.
2026-07-26	deepseek-v4-pro	added	—	—	https://www.together.ai/models/deepseek-v4-pro
Added DeepSeek V4 Pro at $1.74 input / $3.48 output per 1M tokens (cached input $0.20) with a 512K-token context window — the largest context of any open-weight model we track, and roughly 4x the 128K that the V3 generation offered. It has replaced V3.1 as the DeepSeek entry on Together's main pricing page.
2026-07-26	gemini-new-models	deferred	—	—	https://ai.google.dev/gemini-api/docs/pricing
NOT ADDED PENDING VERIFICATION. Three new Gemini models appeared with confirmed prices — Gemini 3.6 Flash ($1.50/$7.50), Gemini 3.5 Flash ($1.50/$9.00) and Gemini 3.5 Flash-Lite ($0.30/$2.50) — but neither the pricing page nor the models overview page publishes their context windows or exact API model ids, and the per-model doc pages 404. Rather than guess a context window (which would feed the /longest-context-window/ ranking directly), these are held back until the figures can be sourced. All six existing Gemini entries were re-verified against the live page and are unchanged.
2026-07-26	together-oss-new	deferred	—	—	https://www.together.ai/pricing
NOT ADDED PENDING VERIFICATION. Together listed several new models whose context windows could not be sourced (their model pages 404): GLM-5.2 ($1.40/$4.40, identical to GLM-5.1), Qwen3.7-Plus ($0.32/$1.28), Qwen3.6-Plus ($0.50/$3.00), Qwen3.5-397B-A17B ($0.60/$3.60) and Qwen3.5 9B ($0.17/$0.25). Prices recorded here for the next run; entries held back rather than shipped with a guessed context window.
2026-07-26	openai-legacy	possible-deprecation	—	—	https://developers.openai.com/api/docs/pricing
POSSIBLE DEPRECATION, VERIFY — no change applied. The GPT-5.0 through 5.2 tiers, the GPT-4.1 family, the o3/o4 reasoning series and the GPT-4o family did not appear in this run's fetch of either the pricing page or the models page, which now foreground the GPT-5.6 line. That is suggestive but not proof of removal — the fetch may simply have captured the flagship section. All entries left active per the never-remove-on-one-fetch rule. Worth a manual check of the OpenAI deprecations page before flipping any flags.
2026-07-26	together-oss-missing	possible-deprecation	—	—	https://www.together.ai/pricing
POSSIBLE DEPRECATION, VERIFY — no change applied. DeepSeek V3.1, DeepSeek R1, Qwen3 Coder 480B and Qwen 2.5 72B are no longer visible on Together's serverless pricing table, which now leads with DeepSeek V4 Pro and the Qwen3.5+ generation. These entries were already flagged as indicative pricing and are left active. Anyone billing against them should confirm their provider's current rate.
2026-05-31	claude-opus	updated	claude-opus-4-7	claude-opus-4-8	https://www.anthropic.com/news/claude-opus-4-8
Anthropic released Claude Opus 4.8 on 2026-05-28. Standard mode pricing unchanged at $5 input / $25 output per 1M tokens. The claude-opus entry now resolves to 4.8 (apiId: claude-opus-4-8). Inherits the Opus 4.7 tokenizer behavior (up to 35% more tokens than legacy Claude models for the same text). Anthropic describes 4.8 as 'a modest but tangible improvement' with gains in agentic coding, reasoning, knowledge work, and honesty.
2026-05-31	claude-opus-fast	added	—	—	https://www.anthropic.com/news/claude-opus-4-8
Added Claude Opus 4.8 Fast Mode as a separate entry: $10 input / $50 output per 1M tokens, producing tokens at ~2.5x normal speed. Anthropic dropped the fast-tier price 3x vs Opus 4.7 (was $30/$150). Useful for latency-sensitive workloads where Opus quality is needed but the standard tier's throughput is the bottleneck.
2026-05-27	llama-family	tokenizer-accuracy	approx-3pct	exact	https://www.npmjs.com/package/llama-tokenizer-js
Shipped real BPE tokenization for the Llama family via llama-tokenizer-js (lazy-loaded ~2MB chunk on first Llama count). All llama-* models now labeled 'exact' instead of '≈±3%'. Mistral / Qwen / DeepSeek / GLM still use heuristic (now character-class-aware — buckets text into ASCII / digit / CJK / whitespace and applies per-class ratios — more accurate than the prior constant-ratio version, still labeled ≈±3%).
2026-04-27	oss-batch	added	—	—	https://www.together.ai/pricing
Added 5 OSS models: Llama 3.3 70B ($0.88/$0.88, current Together flagship Meta), DeepSeek V3.1 ($0.60/$1.70 Together listing), DeepSeek R1 ($3/$7 reasoning), Qwen3 Coder 480B ($2/$2 current Alibaba coding flagship), GLM-5.1 ($1.40/$4.40 new Zhipu provider). Llama 3.1 entries kept but flagged 'no longer on Together's main page — verify provider'. Llama 4 NOT added — not on Together's current pricing page.
2026-04-27	openai-batch	added	—	—	https://developers.openai.com/api/docs/pricing
Added 21 OpenAI models: full GPT-5 family (5, 5.1, 5.2, 5.2 Pro, 5.3, 5.4, 5.4 Mini, 5.4 Nano, 5.4 Pro, 5.5, 5.5 Pro, mini, nano, Pro tiers), GPT-4.1 family (4.1, mini, nano), and o-series reasoning (o3, o3-mini, o3-pro, o4-mini).
2026-04-27	google-batch	added	—	—	https://ai.google.dev/gemini-api/docs/pricing
Added Gemini 3 family: 3.1 Pro Preview ($2/$12), 3 Flash Preview ($0.50/$3), 3.1 Flash-Lite Preview ($0.25/$1.50). Added Gemini 2.5 Flash-Lite GA ($0.10/$0.40).
2026-04-27	gemini-2-5-flash	input	0.075	0.3	https://ai.google.dev/gemini-api/docs/pricing
Correction — Gemini 2.5 Flash priced higher than initial entry. Live Google pricing page lists $0.30 input / $2.50 output.
2026-04-27	gemini-2-5-flash	output	0.3	2.5	https://ai.google.dev/gemini-api/docs/pricing
Correction — see input change above.
2026-04-27	claude-opus	input	15	5	https://platform.claude.com/docs/en/about-claude/pricing
Correction — initial Opus 4.7 entry was based on prior-generation Opus pricing. Anthropic prices Opus 4.7 at $5 input / $25 output.
2026-04-27	claude-opus	output	75	25	https://platform.claude.com/docs/en/about-claude/pricing
Correction — see input change above.
2026-04-27	claude-haiku	input	0.8	1	https://platform.claude.com/docs/en/about-claude/pricing
Correction — Haiku 4.5 priced higher than initial entry suggested.
2026-04-27	claude-haiku	output	4	5	https://platform.claude.com/docs/en/about-claude/pricing
Correction — see input change above.
2026-04-26	all	initial	—	—	site launch
Initial pricing snapshot — 15 models across 7 providers.