GPT-5 vs Gemini 3.1 Pro
| Spec | GPT-5 | Gemini 3.1 Pro |
|---|---|---|
| Provider | OpenAI | |
| Input price (per 1M) | $1.25 | $2.00 |
| Output price (per 1M) | $10.00 | $12.00 |
| Context window | 400,000 | 1,000,000 |
| Tokenizer accuracy | exact (uses official tokenizer) | exact (uses official tokenizer) |
Cost per 1,000 calls across common workloads
| Workload | GPT-5 | Gemini 3.1 Pro | Winner |
|---|---|---|---|
| Short chat (200 in / 100 out) |
$1,250.00 | $1,600.00 | GPT-5 22% cheaper |
| Medium chat (1,000 in / 500 out) |
$6,250.00 | $8,000.00 | GPT-5 22% cheaper |
| Heavy generation (1,000 in / 2,000 out) |
$21,250.00 | $26,000.00 | GPT-5 18% cheaper |
| Long context (8,000 in / 500 out) |
$15,000.00 | $22,000.00 | GPT-5 32% cheaper |
| Code review (3,000 in / 600 out) |
$9,750.00 | $13,200.00 | GPT-5 26% cheaper |
Costs are per 1,000 API calls. Multiply by 1,000 for per-million-calls.
Verdict
Gemini 3.1 Pro wins on context window and multimodal cost. GPT-5 wins on tool use and ecosystem maturity. They're priced similarly at the frontier tier, so the decision is rarely about cost, it's about which capability profile matches your workload.
Cost example
For a 1,000-token prompt with a 200-token reply:
GPT-5: 1000 × $1.25/M + 200 × $10/M = $0.00325 per call
Gemini 3.1 Pro: 1000 × $1.25/M + 200 × $10/M = $0.00325 per call
Price parity at this prompt/output ratio.
For long-context workloads (>200k tokens), Gemini's larger window means you can avoid chunking strategies that would multiply API call counts on GPT-5, so the effective economics tilt toward Gemini.
Context windows
- GPT-5: 400,000 tokens
- Gemini 3.1 Pro: 2,000,000 tokens
Gemini Pro's 2M-token window is the standout feature in 2026. Whole-codebase analysis, entire textbooks, or multi-document research synthesis can be one call instead of chunked retrieval. For these workloads, the cost savings come from fewer API round-trips, not lower per-token rates.
Multimodal cost
Vision input cost differs sharply between the two:
- GPT-5 vision: Tile-based, 170 tokens per 512×512 tile + 85 base. A 1024×1024 image = 765 tokens.
- Gemini 3.1 Pro vision: ~258 tokens per image at standard resolutions, regardless of size up to the limit.
For high-volume image workloads, Gemini is ~3× cheaper per image on input tokens. At 100k images/month, that's a meaningful operating-cost difference.
Capability differences
Where GPT-5 leads:
- Tool use and function calling, more mature, more reliable on agent workflows
- Code generation on novel algorithmic problems
- Math and competition reasoning
- Ecosystem (OpenAI's API platform, SDKs, libraries are more battle-tested)
Where Gemini 3.1 Pro leads:
- Context window (5× larger)
- Multimodal cost (3× cheaper per image)
- Multimodal reasoning quality on long videos and dense documents
- Native Google integration (Search grounding, YouTube transcript access)
Tokenizer notes
Both use BPE-family tokenizers; per-character token efficiency on English text is within ~5% of each other. So the per-token prices are roughly comparable in real-world cost-per-character terms, closer than the Claude vs GPT comparison.
When to choose each
Use GPT-5 when:
- You're building agent workflows with multi-step tool use
- You need the most reliable function-calling reliability
- You're already invested in OpenAI's SDK / ecosystem
- Your workloads fit comfortably under 400k tokens
Use Gemini 3.1 Pro when:
- You need to process documents >200k tokens in one call
- Your workload is multimodal, especially high-volume vision or video
- You can use Google's Search grounding or YouTube features
- Cost per image is a constraint
Count tokens on GPT-5 → · Count tokens on Gemini 3.1 Pro →