Which LLM providers offer batch-API discounts?

Q: Which LLM providers offer batch-API discounts?

OpenAI, Anthropic, and Google all offer 50% off for async batch jobs with a 24-hour SLA. What qualifies, the trade-offs, and when batching actually pays back.

Updated 2026-05-31 · By Clinton Patrick · Methodology

The short answer

OpenAI, Anthropic, and Google all offer ~50% off for batch-mode workloads, you submit a batch of requests, get results back within 24 hours instead of real-time. If your workload doesn't need synchronous responses, batching cuts your LLM bill in half.

Batch pricing across providers (April 2026)

Provider	Discount	SLA	Notes
OpenAI (GPT-5, GPT-4.1, o-series)	50% off	24 hours	All models eligible; Pro tiers also discounted
Anthropic (Claude Opus, Sonnet, Haiku)	50% off	24 hours	Claude Batch API
Google Gemini (3.x, 2.5 family)	Varies	Up to 24h	Listed as "Batch" tier on each model's pricing page
DeepSeek	Not advertised	n/a	Direct API only; no batch tier
Together.ai (open weights)	Not currently	n/a	Real-time pricing only

When batching makes sense

Bulk classification or labeling, tagging support tickets, sentiment-analyzing review datasets, generating embeddings for archive content.
Evaluation harnesses, running 10,000 test prompts through 5 models for comparison.
Content generation pipelines, generating SEO meta descriptions for an entire product catalog overnight.
Synthetic data generation for fine-tuning sets.
Bulk extraction, pulling structured data from a corpus of documents.

When batching doesn't work

Real-time chat / agents, by definition, synchronous.
Interactive code generation, developer waiting for the response.
Workloads needing tool use / function calling, batch APIs typically don't support tools.
Sub-minute SLA requirements, batch SLA is hours to a day.

Cost example

10,000-prompt evaluation across GPT-5.5 at 1,000 input + 500 output tokens each:

Real-time:

Input: 10,000 × 1,000 × $5/M = $50
Output: 10,000 × 500 × $30/M = $150
Total: $200

Batch:

Input: 10,000 × 1,000 × $2.50/M = $25
Output: 10,000 × 500 × $15/M = $75
Total: $100

Savings: $100, 50%, at the cost of overnight latency.

At enterprise scale (millions of prompts), batch savings on a single eval campaign run into thousands of dollars.

Caveats

Batch submission limits: typically 50,000 requests per batch on OpenAI; smaller on some providers. Plan around the limit if you're processing millions.
Failure handling: a batch job can partially fail, your code needs to reconcile per-request status.
Combined with caching: batch and prompt caching stack on some providers (effectively 75% off when both apply), but verify the exact discount math with the provider, some apply caching first, batch second.
Pro tiers and reasoning models: o3-pro / o4-mini in batch mode still generate reasoning tokens that count toward output. The 50% discount is on the per-token rate, not on reasoning-token overhead.

How to batch in practice

OpenAI: upload a JSONL file via POST /v1/files, then POST /v1/batches. Check status, download results when complete.

Anthropic: POST /v1/messages/batches with an array of request objects. Up to 100k requests per batch on enterprise plans.

Google: each Gemini model's "Batch" pricing tier, use the Batch API mode flag in your request.

Get a cost estimate for your batch

Paste a representative prompt into the counter to get the per-call cost, then divide by 2 to estimate your batch-mode cost. For workloads above ~10k requests per month, the engineering investment in a batch pipeline pays back in the first month.

Try this on every model

Claude Opus 4.8 $5.00/$25.00
Claude Opus 4.8 (Fast Mode) $10.00/$50.00
Claude Sonnet 4.6 $3.00/$15.00
Claude Haiku 4.5 $1.00/$5.00
GPT-5.5 $5.00/$30.00
GPT-5.5 Pro $30.00/$180.00
GPT-5.4 $2.50/$15.00
GPT-5.4 Mini $0.75/$4.50
GPT-5.4 Nano $0.20/$1.25
GPT-5.4 Pro $30.00/$180.00
GPT-5.3 $1.75/$14.00
GPT-5.2 $1.75/$14.00
GPT-5.2 Pro $21.00/$168.00
GPT-5.1 $1.25/$10.00
GPT-5 $1.25/$10.00
GPT-5 Mini $0.25/$2.00
GPT-5 Nano $0.05/$0.40
GPT-5 Pro $15.00/$120.00
GPT-4.1 $2.00/$8.00
GPT-4.1 Mini $0.40/$1.60
GPT-4.1 Nano $0.10/$0.40
o3 $2.00/$8.00
o3-mini $1.10/$4.40
o3-pro $20.00/$80.00
o4-mini $1.10/$4.40
GPT-4o $2.50/$10.00
GPT-4o mini $0.15/$0.60
GPT-4 Turbo $10.00/$30.00
Gemini 3.1 Pro $2.00/$12.00
Gemini 3 Flash $0.50/$3.00
Gemini 3.1 Flash-Lite $0.25/$1.50
Gemini 2.5 Pro $1.25/$10.00
Gemini 2.5 Flash $0.30/$2.50
Gemini 2.5 Flash-Lite $0.10/$0.40
Llama 3.3 70B $0.88/$0.88
Llama 3.1 405B $3.50/$3.50
Llama 3.1 70B $0.59/$0.79
Llama 3.1 8B $0.18/$0.18
Mistral Large $2.00/$6.00
DeepSeek V3 $0.27/$1.10
DeepSeek V3.1 $0.60/$1.70
DeepSeek R1 $3.00/$7.00
Qwen 2.5 72B $0.90/$0.90
Qwen 2.5 Coder 32B $0.80/$0.80
Qwen3 Coder 480B $2.00/$2.00
GLM-5.1 $1.40/$4.40

Try the live counter →