What is the cheapest AI model?
The short answer
As of April 2026, Gemini 2.5 Flash at $0.075 per million input tokens / $0.30 per million output tokens is the cheapest model from a major provider with an officially supported tokenizer.
Among open-weights models hosted via inference providers, prices go even lower for self-hosting — but you pay in setup time and infrastructure.
Cheapest models, ranked
Sorted by input price (since input dominates most workloads):
| Rank | Model | Input ($/M) | Output ($/M) | Notes |
|---|---|---|---|---|
| 1 | Gemini 2.5 Flash | $0.075 | $0.30 | 1M context, multimodal, exact tokenizer |
| 2 | GPT-4o mini | $0.15 | $0.60 | Strong instruction-following at this tier |
| 3 | Llama 3.1 8B | $0.18 | $0.18 | Open weights; self-host for ~free |
| 4 | DeepSeek V3 | $0.27 | $1.10 | Frontier capability at this price |
| 5 | Llama 3.1 70B | $0.59 | $0.79 | Open mid-tier; widely hosted |
| 6 | Claude Haiku 4.5 | $0.80 | $4.00 | Best Claude instruction-following at low cost |
| 7 | Qwen 2.5 Coder 32B | $0.80 | $0.80 | Code-specialized open weights |
| 8 | Qwen 2.5 72B | $0.90 | $0.90 | Open multilingual general-purpose |
"Cheap" depends on what you need
The cheapest model isn't always the right answer. Consider:
- Tokenizer accuracy. Models with API-side
count_tokens(Anthropic, Gemini) and OpenAI models have exact counts. Open-weights models in this counter are estimated to ±3%. - Capability gap. Gemini Flash and GPT-4o mini handle most workloads. Hard reasoning tasks may need Claude Sonnet ($3/$15) or GPT-4o ($2.50/$10).
- Output-heavy vs input-heavy. Llama 8B charges the same per-token rate for input and output, which makes it great for output-heavy generation but worse for input-heavy RAG.
- Context length needs. Gemini Flash's 1M-token window is 8× larger than most competitors at this price tier.
Get a real cost comparison
Paste your prompt into the counter — it shows the actual token count and per-call cost across every model, so you can choose by total cost on your workload, not by per-million headline.
Try this on every model
- Claude Opus 4.7 $15.00/$75.00
- Claude Sonnet 4.6 $3.00/$15.00
- Claude Haiku 4.5 $0.80/$4.00
- GPT-4o $2.50/$10.00
- GPT-4o mini $0.15/$0.60
- GPT-4 Turbo $10.00/$30.00
- Gemini 2.5 Pro $1.25/$10.00
- Gemini 2.5 Flash $0.07/$0.30
- Llama 3.1 405B $3.50/$3.50
- Llama 3.1 70B $0.59/$0.79
- Llama 3.1 8B $0.18/$0.18
- Mistral Large $2.00/$6.00
- DeepSeek V3 $0.27/$1.10
- Qwen 2.5 72B $0.90/$0.90
- Qwen 2.5 Coder 32B $0.80/$0.80