What is the cheapest AI model?

Q: What is the cheapest AI model?

As of 2026-04, Gemini 2.5 Flash at $0.075 per million input tokens is the cheapest exact-tokenizer model from a major provider. Here's the full price-sorted list.

The short answer

As of April 2026, Gemini 2.5 Flash at $0.075 per million input tokens / $0.30 per million output tokens is the cheapest model from a major provider with an officially supported tokenizer.

Among open-weights models hosted via inference providers, prices go even lower for self-hosting — but you pay in setup time and infrastructure.

Cheapest models, ranked

Sorted by input price (since input dominates most workloads):

Rank	Model	Input ($/M)	Output ($/M)	Notes
1	Gemini 2.5 Flash	$0.075	$0.30	1M context, multimodal, exact tokenizer
2	GPT-4o mini	$0.15	$0.60	Strong instruction-following at this tier
3	Llama 3.1 8B	$0.18	$0.18	Open weights; self-host for ~free
4	DeepSeek V3	$0.27	$1.10	Frontier capability at this price
5	Llama 3.1 70B	$0.59	$0.79	Open mid-tier; widely hosted
6	Claude Haiku 4.5	$0.80	$4.00	Best Claude instruction-following at low cost
7	Qwen 2.5 Coder 32B	$0.80	$0.80	Code-specialized open weights
8	Qwen 2.5 72B	$0.90	$0.90	Open multilingual general-purpose

"Cheap" depends on what you need

The cheapest model isn't always the right answer. Consider:

Tokenizer accuracy. Models with API-side count_tokens (Anthropic, Gemini) and OpenAI models have exact counts. Open-weights models in this counter are estimated to ±3%.
Capability gap. Gemini Flash and GPT-4o mini handle most workloads. Hard reasoning tasks may need Claude Sonnet ($3/$15) or GPT-4o ($2.50/$10).
Output-heavy vs input-heavy. Llama 8B charges the same per-token rate for input and output, which makes it great for output-heavy generation but worse for input-heavy RAG.
Context length needs. Gemini Flash's 1M-token window is 8× larger than most competitors at this price tier.

Get a real cost comparison

Paste your prompt into the counter — it shows the actual token count and per-call cost across every model, so you can choose by total cost on your workload, not by per-million headline.

Try this on every model

Claude Opus 4.7 $15.00/$75.00
Claude Sonnet 4.6 $3.00/$15.00
Claude Haiku 4.5 $0.80/$4.00
GPT-4o $2.50/$10.00
GPT-4o mini $0.15/$0.60
GPT-4 Turbo $10.00/$30.00
Gemini 2.5 Pro $1.25/$10.00
Gemini 2.5 Flash $0.07/$0.30
Llama 3.1 405B $3.50/$3.50
Llama 3.1 70B $0.59/$0.79
Llama 3.1 8B $0.18/$0.18
Mistral Large $2.00/$6.00
DeepSeek V3 $0.27/$1.10
Qwen 2.5 72B $0.90/$0.90
Qwen 2.5 Coder 32B $0.80/$0.80

Try the live counter →