#tHow Many Tokens?

← Back to counter

Which LLM providers offer batch-API discounts?

The short answer

OpenAI, Anthropic, and Google all offer ~50% off for batch-mode workloads, you submit a batch of requests, get results back within 24 hours instead of real-time. If your workload doesn't need synchronous responses, batching cuts your LLM bill in half.

Batch pricing across providers (April 2026)

ProviderDiscountSLANotes
OpenAI (GPT-5, GPT-4.1, o-series)50% off24 hoursAll models eligible; Pro tiers also discounted
Anthropic (Claude Opus, Sonnet, Haiku)50% off24 hoursClaude Batch API
Google Gemini (3.x, 2.5 family)VariesUp to 24hListed as "Batch" tier on each model's pricing page
DeepSeekNot advertisedn/aDirect API only; no batch tier
Together.ai (open weights)Not currentlyn/aReal-time pricing only

When batching makes sense

When batching doesn't work

Cost example

10,000-prompt evaluation across GPT-5.5 at 1,000 input + 500 output tokens each:

Real-time:

Batch:

Savings: $100, 50%, at the cost of overnight latency.

At enterprise scale (millions of prompts), batch savings on a single eval campaign run into thousands of dollars.

Caveats

How to batch in practice

OpenAI: upload a JSONL file via POST /v1/files, then POST /v1/batches. Check status, download results when complete.

Anthropic: POST /v1/messages/batches with an array of request objects. Up to 100k requests per batch on enterprise plans.

Google: each Gemini model's "Batch" pricing tier, use the Batch API mode flag in your request.

Get a cost estimate for your batch

Paste a representative prompt into the counter to get the per-call cost, then divide by 2 to estimate your batch-mode cost. For workloads above ~10k requests per month, the engineering investment in a batch pipeline pays back in the first month.

Try this on every model

Try the live counter →