How Many Tokens?

← All models

Llama 3.1 405B: token counter & pricing

Meta · approximate, within ±3% of reference · pricing as of 2026-04-26.

Provider
Meta
API model ID
meta-llama/llama-3.1-405b-instruct
Context window
128,000 tokens
Input price
$3.50 per 1M tokens
Output price
$3.50 per 1M tokens
Tokenizer accuracy
approximate, within ±3% of reference
Pricing as of
2026-04-26

Open the counter to count tokens for Llama 3.1 405B in real time.

What is Llama 3.1 405B?

Llama 3.1 405B is Meta's flagship open-weights model — 405 billion parameters, 128k context. The largest open model that's competitive with frontier closed models on most benchmarks.

You don't run 405B yourself unless you have serious GPU infrastructure. Most teams access it via hosted providers (Together.ai, Fireworks, Replicate, Groq, Deepinfra) at varying price points.

How tokens are counted here

Llama uses a SentencePiece-based BPE tokenizer. We approximate counts in your browser using a family-tuned heuristic — accurate within roughly ±3% of the reference tokenizer for typical English text. Marked ≈±3% in the results table.

For exact counts, run transformers.AutoTokenizer.from_pretrained("meta-llama/Llama-3.1-405B-Instruct") locally on your text.

When to use Llama 405B

When not to use it:

Pricing notes

The price shown ($3.50 input / $3.50 output per million) is indicative — actual cost depends on which hosted provider you use. Together.ai is one of the more competitive providers for 405B. Verify current rates on the provider you select.

Common questions

Why does Llama 405B cost the same on input and output?

Open-source models hosted by inference providers typically charge a single per-token rate, not split input/output rates the way OpenAI and Anthropic do. The 80/20 input/output ratio doesn't change cost on these models.

Is the tokenizer the same across all Llama 3.1 sizes?

Yes — 405B, 70B, and 8B all share the Llama 3.1 tokenizer. Token counts in the calculator are identical across the three; only the per-token price changes.

Can I fine-tune 405B?

Technically yes, practically expensive — full fine-tuning needs ~16 H100s for hours per epoch. LoRA fine-tuning is more realistic. Most teams fine-tune 70B or 8B instead and reserve 405B for inference.

Compare Llama 3.1 405B to other models