#tHow Many Tokens?

← Back to counter

Which AI model has the longest context window?

The short answer

Gemini 2.5 Pro has the longest production context window at 2,000,000 tokens, by a wide margin. It's the only model in this calculator with a 2M window.

The next tier, 1,000,000 tokens, is wider than it used to be. As of April 2026, all current Gemini family members (3.1 Pro, 3 Flash, 3.1 Flash-Lite, 2.5 Flash, 2.5 Flash-Lite) and the entire GPT-4.1 family (GPT-4.1, mini, nano) handle 1M.

OpenAI's GPT-5 family caps at 400K. Claude Opus / Sonnet / Haiku sit at 200K. Open-source models (Llama 3.x, Mistral, DeepSeek, Qwen) are mostly 128K.

Ranked by context window size

ModelContextPractical use case
Gemini 2.5 Pro2,000,000Entire codebases, long-doc Q&A without retrieval
Gemini 3.1 Pro Preview1,000,000Frontier reasoning at long context (Preview tier)
Gemini 3 Flash Preview1,000,000Pro-quality reasoning at Flash price
Gemini 3.1 Flash-Lite Preview1,000,000Cheapest Gemini 3 with 1M window
Gemini 2.5 Flash1,000,000GA Flash with 1M window
Gemini 2.5 Flash-Lite1,000,000Cheapest GA Gemini
GPT-4.11,000,000OpenAI's long-context flagship, still on API after Feb 2026 ChatGPT retirement
GPT-4.1 Mini / Nano1,000,000Long-context at lower cost
GPT-5 family (5, 5.1, 5.2, 5.3, 5.4, 5.5 incl. mini/nano/pro)400,000Wide context but not 1M-class
Claude Opus 4.8 / Sonnet 4.6 / Haiku 4.5200,000Strong long-context reasoning at frontier quality
o3 / o3-mini / o3-pro / o4-mini200,000OpenAI reasoning tier
GPT-4o / GPT-4o mini / GPT-4 Turbo128,000Predecessor OpenAI line; still callable on API
Qwen 2.5 / Qwen 3 Coder131,072Open-weights long-context
Llama 3.3 / 3.1 (all sizes)128,000Open-weights; quality degrades past ~32k in independent evals
Mistral Large128,000EU-hosted long-context
DeepSeek V3 / V3.1 / R1128,000Frontier-class at low price

The gap between "context length" and "useful context length"

A 1M or 2M token window doesn't mean the model uses every token equally well. Independent evaluations consistently show:

If your workload depends on the model finding a specific fact buried deep in a long context, test it with your actual prompts before committing to one model based on advertised window size.

When you actually need a long context window

For everything else, most chat, RAG, classification, extraction, 32k-128k is plenty, and shorter is cheaper to run.

A note on Llama 4

Meta released Llama 4 Maverick and Llama 4 Scout in April 2025. Llama 4 Scout was reported with a 10,000,000-token context window, which would be the largest in the industry. As of April 2026, Llama 4 is not available on Together.ai's published pricing page, so we haven't added it to this calculator. Availability via other providers (self-hosted, Hugging Face, Replicate) varies. If you have access to Llama 4 Scout via a provider, the 10M window is real but practical recall quality at that length has not been independently validated.

Get cost at your context length

Paste your full context into the counter. It will show exact token counts and per-call cost across every model, so you can see which fit your workload and what they'd cost.

Try this on every model

Try the live counter →