What is the cheapest AI model?
The short answer
As of April 27, 2026, GPT-5 Nano at $0.05 per million input tokens / $0.40 per million output tokens is the cheapest exact-tokenizer model from a major provider.
For multimodal workloads or longer context, Gemini 2.5 Flash-Lite ($0.10 / $0.40) is the closest GA alternative, same output price, double the input rate, but with a 1M-token context window vs GPT-5 Nano's 400K.
Cheapest models, ranked
Sorted by per-1M-call cost on a typical 1,000-token input + 200-token output workload:
| Rank | Model | Input ($/M) | Output ($/M) | 1M calls cost | Notes |
|---|---|---|---|---|---|
| 1 | GPT-5 Nano | $0.05 | $0.40 | $130 | Cheapest input rate of any exact-tokenizer model. 400K context. |
| 2 | Gemini 2.5 Flash-Lite | $0.10 | $0.40 | $180 | 1M context, multimodal, GA |
| 3 | GPT-4.1 Nano | $0.10 | $0.40 | $180 | 1M context, exact o200k_base tokenizer |
| 4 | Gemini 3.1 Flash-Lite Preview | $0.25 | $1.50 | $550 | Preview tier; cheapest Gemini 3 |
| 5 | GPT-5.4 Nano | $0.20 | $1.25 | $450 | Reasoning-tier nano; cached input $0.02/M |
| 6 | Llama 3.1 8B (Together) | $0.18 | $0.18 | $216 | Open weights; ≈±3% tokenizer estimate |
| 7 | DeepSeek V3 | $0.27 | $1.10 | $490 | Frontier capability at this price |
| 8 | Gemini 2.5 Flash | $0.30 | $2.50 | $800 | Was the cheap-flash default; Flash-Lite is now cheaper |
| 9 | GPT-5 Mini | $0.25 | $2.00 | $650 | Step-up from Nano; better reasoning |
| 10 | Gemini 3 Flash Preview | $0.50 | $3.00 | $1,100 | Pro-tier capability at Flash prices |
| 11 | Claude Haiku 4.5 | $1.00 | $5.00 | $2,000 | Best Claude instruction-following at low cost |
"Cheap" depends on what you need
The cheapest model isn't always the right answer. Consider:
- Tokenizer accuracy. OpenAI, Anthropic, and Google models in this list have exact counts. Open-weights models (Llama, Mistral, DeepSeek, Qwen) are estimated to ~±3%.
- Capability gap. GPT-5 Nano and Gemini 2.5 Flash-Lite are designed for high-volume routing, classification, and extraction. They will fall behind on hard reasoning. For mid-tier tasks, GPT-5 Mini or Claude Haiku is the right call.
- Output-heavy vs input-heavy. Llama 8B charges the same per-token rate for input and output, which makes it interesting for output-heavy generation but worse for input-heavy RAG. GPT-5 Nano's 8× output-to-input ratio rewards short replies.
- Context length needs. GPT-5 Nano caps at 400K tokens. Gemini 2.5 Flash-Lite handles 1M. Llama 4 Scout (when added) handles 10M. Choose based on your typical input length.
- Preview vs GA. Gemini 3 family and OpenAI's "Pro" tiers are still in Preview/early-access, pricing or behavior could shift. GA models give pricing stability.
Cheaper than this list
If you really want sub-$0.05 per million tokens:
- Self-host Llama 3.1 8B or Qwen 2.5 7B on your own GPU. Hardware cost only, no per-token bill.
- Run prompt caching on any of the above, cached input drops to ~10% of standard rate (so GPT-5 Nano cached input is $0.005/M).
- Use the Batch API on OpenAI / Anthropic, flat 50% discount for asynchronous workloads.
Get a real cost comparison
Paste your prompt into the counter, it shows the actual token count and per-call cost across every model, so you can choose by total cost on your workload instead of by per-million headline.
Try this on every model
- Claude Opus 4.8 $5.00/$25.00
- Claude Opus 4.8 (Fast Mode) $10.00/$50.00
- Claude Sonnet 4.6 $3.00/$15.00
- Claude Haiku 4.5 $1.00/$5.00
- GPT-5.5 $5.00/$30.00
- GPT-5.5 Pro $30.00/$180.00
- GPT-5.4 $2.50/$15.00
- GPT-5.4 Mini $0.75/$4.50
- GPT-5.4 Nano $0.20/$1.25
- GPT-5.4 Pro $30.00/$180.00
- GPT-5.3 $1.75/$14.00
- GPT-5.2 $1.75/$14.00
- GPT-5.2 Pro $21.00/$168.00
- GPT-5.1 $1.25/$10.00
- GPT-5 $1.25/$10.00
- GPT-5 Mini $0.25/$2.00
- GPT-5 Nano $0.05/$0.40
- GPT-5 Pro $15.00/$120.00
- GPT-4.1 $2.00/$8.00
- GPT-4.1 Mini $0.40/$1.60
- GPT-4.1 Nano $0.10/$0.40
- o3 $2.00/$8.00
- o3-mini $1.10/$4.40
- o3-pro $20.00/$80.00
- o4-mini $1.10/$4.40
- GPT-4o $2.50/$10.00
- GPT-4o mini $0.15/$0.60
- GPT-4 Turbo $10.00/$30.00
- Gemini 3.1 Pro $2.00/$12.00
- Gemini 3 Flash $0.50/$3.00
- Gemini 3.1 Flash-Lite $0.25/$1.50
- Gemini 2.5 Pro $1.25/$10.00
- Gemini 2.5 Flash $0.30/$2.50
- Gemini 2.5 Flash-Lite $0.10/$0.40
- Llama 3.3 70B $0.88/$0.88
- Llama 3.1 405B $3.50/$3.50
- Llama 3.1 70B $0.59/$0.79
- Llama 3.1 8B $0.18/$0.18
- Mistral Large $2.00/$6.00
- DeepSeek V3 $0.27/$1.10
- DeepSeek V3.1 $0.60/$1.70
- DeepSeek R1 $3.00/$7.00
- Qwen 2.5 72B $0.90/$0.90
- Qwen 2.5 Coder 32B $0.80/$0.80
- Qwen3 Coder 480B $2.00/$2.00
- GLM-5.1 $1.40/$4.40