How Many Tokens?

← Back to counter

Which AI model has the longest context window?

The short answer

Gemini 2.5 Pro has the longest production context window at 2,000,000 tokens — by a wide margin. Gemini 2.5 Flash follows at 1,000,000.

Most competitors top out at 128,000 tokens (the GPT-4o family, Llama 3.1, DeepSeek V3, Mistral Large). Claude Opus, Sonnet, and Haiku sit in between at 200,000.

Ranked by context window size

ModelContextPractical use case
Gemini 2.5 Pro2,000,000Entire codebases, long-doc Q&A without retrieval
Gemini 2.5 Flash1,000,000Same use cases at lower cost, lower quality
Claude Opus 4.7200,000Long-context reasoning at frontier quality
Claude Sonnet 4.6200,000Long-context production workloads
Claude Haiku 4.5200,000Long-context high-volume
Qwen 2.5 72B / Coder131,072Open-weights long-context
GPT-4o family128,000Standard long-context for OpenAI users
Llama 3.1 (all sizes)128,000Open-weights; quality degrades past ~32k
DeepSeek V3128,000Frontier-class at low price
Mistral Large128,000EU-hosted long-context

The gap between "context length" and "useful context length"

A 1M-token window doesn't mean the model uses every token equally well. Independent evaluations consistently show:

If your workload depends on the model finding a specific fact buried deep in a long context, test it with your actual prompts before committing.

When you actually need a long context window

For everything else — most chat, RAG, classification, extraction — 32k-128k is plenty, and shorter is cheaper to run.

Get cost at your context length

Paste your full context into the counter. It will show exact token counts and per-call cost across every model, so you can see which fit your workload and what they'd cost.

Try this on every model

Try the live counter →