Llama 3.1 405B: token counter & pricing
Meta · approximate, within ±3% of reference · pricing as of 2026-04-26.
- Provider
- Meta
- API model ID
meta-llama/llama-3.1-405b-instruct- Context window
- 128,000 tokens
- Input price
- $3.50 per 1M tokens
- Output price
- $3.50 per 1M tokens
- Tokenizer accuracy
- approximate, within ±3% of reference
- Pricing as of
- 2026-04-26
Open the counter to count tokens for Llama 3.1 405B in real time.
What is Llama 3.1 405B?
Llama 3.1 405B is Meta's flagship open-weights model — 405 billion parameters, 128k context. The largest open model that's competitive with frontier closed models on most benchmarks.
You don't run 405B yourself unless you have serious GPU infrastructure. Most teams access it via hosted providers (Together.ai, Fireworks, Replicate, Groq, Deepinfra) at varying price points.
How tokens are counted here
Llama uses a SentencePiece-based BPE tokenizer. We approximate counts in your browser using a family-tuned heuristic — accurate within roughly ±3% of the reference tokenizer for typical English text. Marked ≈±3% in the results table.
For exact counts, run transformers.AutoTokenizer.from_pretrained("meta-llama/Llama-3.1-405B-Instruct") locally on your text.
When to use Llama 405B
- You need open weights. Compliance, on-premises deployment, fine-tuning rights, model portability across providers.
- You're already running Llama 70B and need a quality bump on hard prompts.
- You want frontier-class reasoning at non-frontier prices. ~$3.50/$3.50 per million via Together is roughly 4× cheaper than GPT-4o on output, 7× cheaper than Claude Sonnet on output.
When not to use it:
- High-volume / latency-sensitive workloads. 405B is slow even on optimized inference stacks.
- Anywhere Llama 70B works — 70B is ~6× cheaper.
Pricing notes
The price shown ($3.50 input / $3.50 output per million) is indicative — actual cost depends on which hosted provider you use. Together.ai is one of the more competitive providers for 405B. Verify current rates on the provider you select.
Common questions
Why does Llama 405B cost the same on input and output?
Open-source models hosted by inference providers typically charge a single per-token rate, not split input/output rates the way OpenAI and Anthropic do. The 80/20 input/output ratio doesn't change cost on these models.
Is the tokenizer the same across all Llama 3.1 sizes?
Yes — 405B, 70B, and 8B all share the Llama 3.1 tokenizer. Token counts in the calculator are identical across the three; only the per-token price changes.
Can I fine-tune 405B?
Technically yes, practically expensive — full fine-tuning needs ~16 H100s for hours per epoch. LoRA fine-tuning is more realistic. Most teams fine-tune 70B or 8B instead and reserve 405B for inference.
Compare Llama 3.1 405B to other models
- Llama 3.1 70B (Meta, $0.59/$0.79)
- Llama 3.1 8B (Meta, $0.18/$0.18)
- Claude Sonnet 4.6 (Anthropic, $3.00/$15.00)
- GPT-4o (OpenAI, $2.50/$10.00)
- Mistral Large (Mistral, $2.00/$6.00)