How Many Tokens?

← Back to counter

What is the cheapest AI model?

The short answer

As of April 2026, Gemini 2.5 Flash at $0.075 per million input tokens / $0.30 per million output tokens is the cheapest model from a major provider with an officially supported tokenizer.

Among open-weights models hosted via inference providers, prices go even lower for self-hosting — but you pay in setup time and infrastructure.

Cheapest models, ranked

Sorted by input price (since input dominates most workloads):

RankModelInput ($/M)Output ($/M)Notes
1Gemini 2.5 Flash$0.075$0.301M context, multimodal, exact tokenizer
2GPT-4o mini$0.15$0.60Strong instruction-following at this tier
3Llama 3.1 8B$0.18$0.18Open weights; self-host for ~free
4DeepSeek V3$0.27$1.10Frontier capability at this price
5Llama 3.1 70B$0.59$0.79Open mid-tier; widely hosted
6Claude Haiku 4.5$0.80$4.00Best Claude instruction-following at low cost
7Qwen 2.5 Coder 32B$0.80$0.80Code-specialized open weights
8Qwen 2.5 72B$0.90$0.90Open multilingual general-purpose

"Cheap" depends on what you need

The cheapest model isn't always the right answer. Consider:

Get a real cost comparison

Paste your prompt into the counter — it shows the actual token count and per-call cost across every model, so you can choose by total cost on your workload, not by per-million headline.

Try this on every model

Try the live counter →