Which LLM providers offer batch-API discounts?
The short answer
OpenAI, Anthropic, and Google all offer ~50% off for batch-mode workloads, you submit a batch of requests, get results back within 24 hours instead of real-time. If your workload doesn't need synchronous responses, batching cuts your LLM bill in half.
Batch pricing across providers (April 2026)
| Provider | Discount | SLA | Notes |
|---|---|---|---|
| OpenAI (GPT-5, GPT-4.1, o-series) | 50% off | 24 hours | All models eligible; Pro tiers also discounted |
| Anthropic (Claude Opus, Sonnet, Haiku) | 50% off | 24 hours | Claude Batch API |
| Google Gemini (3.x, 2.5 family) | Varies | Up to 24h | Listed as "Batch" tier on each model's pricing page |
| DeepSeek | Not advertised | n/a | Direct API only; no batch tier |
| Together.ai (open weights) | Not currently | n/a | Real-time pricing only |
When batching makes sense
- Bulk classification or labeling, tagging support tickets, sentiment-analyzing review datasets, generating embeddings for archive content.
- Evaluation harnesses, running 10,000 test prompts through 5 models for comparison.
- Content generation pipelines, generating SEO meta descriptions for an entire product catalog overnight.
- Synthetic data generation for fine-tuning sets.
- Bulk extraction, pulling structured data from a corpus of documents.
When batching doesn't work
- Real-time chat / agents, by definition, synchronous.
- Interactive code generation, developer waiting for the response.
- Workloads needing tool use / function calling, batch APIs typically don't support tools.
- Sub-minute SLA requirements, batch SLA is hours to a day.
Cost example
10,000-prompt evaluation across GPT-5.5 at 1,000 input + 500 output tokens each:
Real-time:
- Input: 10,000 × 1,000 × $5/M = $50
- Output: 10,000 × 500 × $30/M = $150
- Total: $200
Batch:
- Input: 10,000 × 1,000 × $2.50/M = $25
- Output: 10,000 × 500 × $15/M = $75
- Total: $100
Savings: $100, 50%, at the cost of overnight latency.
At enterprise scale (millions of prompts), batch savings on a single eval campaign run into thousands of dollars.
Caveats
- Batch submission limits: typically 50,000 requests per batch on OpenAI; smaller on some providers. Plan around the limit if you're processing millions.
- Failure handling: a batch job can partially fail, your code needs to reconcile per-request status.
- Combined with caching: batch and prompt caching stack on some providers (effectively 75% off when both apply), but verify the exact discount math with the provider, some apply caching first, batch second.
- Pro tiers and reasoning models: o3-pro / o4-mini in batch mode still generate reasoning tokens that count toward output. The 50% discount is on the per-token rate, not on reasoning-token overhead.
How to batch in practice
OpenAI: upload a JSONL file via POST /v1/files, then POST /v1/batches. Check status, download results when complete.
Anthropic: POST /v1/messages/batches with an array of request objects. Up to 100k requests per batch on enterprise plans.
Google: each Gemini model's "Batch" pricing tier, use the Batch API mode flag in your request.
Get a cost estimate for your batch
Paste a representative prompt into the counter to get the per-call cost, then divide by 2 to estimate your batch-mode cost. For workloads above ~10k requests per month, the engineering investment in a batch pipeline pays back in the first month.
Try this on every model
- Claude Opus 4.8 $5.00/$25.00
- Claude Opus 4.8 (Fast Mode) $10.00/$50.00
- Claude Sonnet 4.6 $3.00/$15.00
- Claude Haiku 4.5 $1.00/$5.00
- GPT-5.5 $5.00/$30.00
- GPT-5.5 Pro $30.00/$180.00
- GPT-5.4 $2.50/$15.00
- GPT-5.4 Mini $0.75/$4.50
- GPT-5.4 Nano $0.20/$1.25
- GPT-5.4 Pro $30.00/$180.00
- GPT-5.3 $1.75/$14.00
- GPT-5.2 $1.75/$14.00
- GPT-5.2 Pro $21.00/$168.00
- GPT-5.1 $1.25/$10.00
- GPT-5 $1.25/$10.00
- GPT-5 Mini $0.25/$2.00
- GPT-5 Nano $0.05/$0.40
- GPT-5 Pro $15.00/$120.00
- GPT-4.1 $2.00/$8.00
- GPT-4.1 Mini $0.40/$1.60
- GPT-4.1 Nano $0.10/$0.40
- o3 $2.00/$8.00
- o3-mini $1.10/$4.40
- o3-pro $20.00/$80.00
- o4-mini $1.10/$4.40
- GPT-4o $2.50/$10.00
- GPT-4o mini $0.15/$0.60
- GPT-4 Turbo $10.00/$30.00
- Gemini 3.1 Pro $2.00/$12.00
- Gemini 3 Flash $0.50/$3.00
- Gemini 3.1 Flash-Lite $0.25/$1.50
- Gemini 2.5 Pro $1.25/$10.00
- Gemini 2.5 Flash $0.30/$2.50
- Gemini 2.5 Flash-Lite $0.10/$0.40
- Llama 3.3 70B $0.88/$0.88
- Llama 3.1 405B $3.50/$3.50
- Llama 3.1 70B $0.59/$0.79
- Llama 3.1 8B $0.18/$0.18
- Mistral Large $2.00/$6.00
- DeepSeek V3 $0.27/$1.10
- DeepSeek V3.1 $0.60/$1.70
- DeepSeek R1 $3.00/$7.00
- Qwen 2.5 72B $0.90/$0.90
- Qwen 2.5 Coder 32B $0.80/$0.80
- Qwen3 Coder 480B $2.00/$2.00
- GLM-5.1 $1.40/$4.40