Which AI model has the longest context window?
The short answer
Gemini 2.5 Pro has the longest production context window at 2,000,000 tokens, by a wide margin. It's the only model in this calculator with a 2M window.
The next tier, 1,000,000 tokens, is wider than it used to be. As of April 2026, all current Gemini family members (3.1 Pro, 3 Flash, 3.1 Flash-Lite, 2.5 Flash, 2.5 Flash-Lite) and the entire GPT-4.1 family (GPT-4.1, mini, nano) handle 1M.
OpenAI's GPT-5 family caps at 400K. Claude Opus / Sonnet / Haiku sit at 200K. Open-source models (Llama 3.x, Mistral, DeepSeek, Qwen) are mostly 128K.
Ranked by context window size
| Model | Context | Practical use case |
|---|---|---|
| Gemini 2.5 Pro | 2,000,000 | Entire codebases, long-doc Q&A without retrieval |
| Gemini 3.1 Pro Preview | 1,000,000 | Frontier reasoning at long context (Preview tier) |
| Gemini 3 Flash Preview | 1,000,000 | Pro-quality reasoning at Flash price |
| Gemini 3.1 Flash-Lite Preview | 1,000,000 | Cheapest Gemini 3 with 1M window |
| Gemini 2.5 Flash | 1,000,000 | GA Flash with 1M window |
| Gemini 2.5 Flash-Lite | 1,000,000 | Cheapest GA Gemini |
| GPT-4.1 | 1,000,000 | OpenAI's long-context flagship, still on API after Feb 2026 ChatGPT retirement |
| GPT-4.1 Mini / Nano | 1,000,000 | Long-context at lower cost |
| GPT-5 family (5, 5.1, 5.2, 5.3, 5.4, 5.5 incl. mini/nano/pro) | 400,000 | Wide context but not 1M-class |
| Claude Opus 4.8 / Sonnet 4.6 / Haiku 4.5 | 200,000 | Strong long-context reasoning at frontier quality |
| o3 / o3-mini / o3-pro / o4-mini | 200,000 | OpenAI reasoning tier |
| GPT-4o / GPT-4o mini / GPT-4 Turbo | 128,000 | Predecessor OpenAI line; still callable on API |
| Qwen 2.5 / Qwen 3 Coder | 131,072 | Open-weights long-context |
| Llama 3.3 / 3.1 (all sizes) | 128,000 | Open-weights; quality degrades past ~32k in independent evals |
| Mistral Large | 128,000 | EU-hosted long-context |
| DeepSeek V3 / V3.1 / R1 | 128,000 | Frontier-class at low price |
The gap between "context length" and "useful context length"
A 1M or 2M token window doesn't mean the model uses every token equally well. Independent evaluations consistently show:
- Quality degrades on retrieval tasks as context grows past ~32k for most models.
- Gemini 2.5 Pro maintains recall quality reasonably well across the full 2M window, currently the strongest at this.
- Claude Sonnet/Opus maintain quality well to ~100k, then drift.
- Llama 3.x in particular degrades sharply past 32k despite the 128k advertised window.
If your workload depends on the model finding a specific fact buried deep in a long context, test it with your actual prompts before committing to one model based on advertised window size.
When you actually need a long context window
- Loading entire codebases for refactoring or audit, Gemini 2.5 Pro is uniquely good here.
- Long-document Q&A without chunking and retrieval.
- Multi-document synthesis where retrieval would lose cross-document relationships.
- Multi-turn conversations with extensive history that you don't want to summarize.
For everything else, most chat, RAG, classification, extraction, 32k-128k is plenty, and shorter is cheaper to run.
A note on Llama 4
Meta released Llama 4 Maverick and Llama 4 Scout in April 2025. Llama 4 Scout was reported with a 10,000,000-token context window, which would be the largest in the industry. As of April 2026, Llama 4 is not available on Together.ai's published pricing page, so we haven't added it to this calculator. Availability via other providers (self-hosted, Hugging Face, Replicate) varies. If you have access to Llama 4 Scout via a provider, the 10M window is real but practical recall quality at that length has not been independently validated.
Get cost at your context length
Paste your full context into the counter. It will show exact token counts and per-call cost across every model, so you can see which fit your workload and what they'd cost.
Try this on every model
- Claude Opus 4.8 $5.00/$25.00
- Claude Opus 4.8 (Fast Mode) $10.00/$50.00
- Claude Sonnet 4.6 $3.00/$15.00
- Claude Haiku 4.5 $1.00/$5.00
- GPT-5.5 $5.00/$30.00
- GPT-5.5 Pro $30.00/$180.00
- GPT-5.4 $2.50/$15.00
- GPT-5.4 Mini $0.75/$4.50
- GPT-5.4 Nano $0.20/$1.25
- GPT-5.4 Pro $30.00/$180.00
- GPT-5.3 $1.75/$14.00
- GPT-5.2 $1.75/$14.00
- GPT-5.2 Pro $21.00/$168.00
- GPT-5.1 $1.25/$10.00
- GPT-5 $1.25/$10.00
- GPT-5 Mini $0.25/$2.00
- GPT-5 Nano $0.05/$0.40
- GPT-5 Pro $15.00/$120.00
- GPT-4.1 $2.00/$8.00
- GPT-4.1 Mini $0.40/$1.60
- GPT-4.1 Nano $0.10/$0.40
- o3 $2.00/$8.00
- o3-mini $1.10/$4.40
- o3-pro $20.00/$80.00
- o4-mini $1.10/$4.40
- GPT-4o $2.50/$10.00
- GPT-4o mini $0.15/$0.60
- GPT-4 Turbo $10.00/$30.00
- Gemini 3.1 Pro $2.00/$12.00
- Gemini 3 Flash $0.50/$3.00
- Gemini 3.1 Flash-Lite $0.25/$1.50
- Gemini 2.5 Pro $1.25/$10.00
- Gemini 2.5 Flash $0.30/$2.50
- Gemini 2.5 Flash-Lite $0.10/$0.40
- Llama 3.3 70B $0.88/$0.88
- Llama 3.1 405B $3.50/$3.50
- Llama 3.1 70B $0.59/$0.79
- Llama 3.1 8B $0.18/$0.18
- Mistral Large $2.00/$6.00
- DeepSeek V3 $0.27/$1.10
- DeepSeek V3.1 $0.60/$1.70
- DeepSeek R1 $3.00/$7.00
- Qwen 2.5 72B $0.90/$0.90
- Qwen 2.5 Coder 32B $0.80/$0.80
- Qwen3 Coder 480B $2.00/$2.00
- GLM-5.1 $1.40/$4.40