Llama 3.3 70B: token counter & pricing
Meta · exact (uses official tokenizer) · pricing as of 2026-05-31.
- Provider
- Meta
- API model ID
meta-llama/Llama-3.3-70B-Instruct-Turbo- Context window
- 128,000 tokens
- Input price
- $0.88 per 1M tokens
- Output price
- $0.88 per 1M tokens
- Tokenizer accuracy
- exact (uses official tokenizer)
- Pricing as of
- 2026-05-31
Open the counter to count tokens for Llama 3.3 70B in real time.
What is Llama 3.3 70B?
Llama 3.3 70B is Meta's current flagship 70B-class open-weights model as advertised on Together.ai's pricing page. Same Llama tokenizer family as 3.1, improved instruction-following and reasoning behavior, single-rate pricing at $0.88 per 1M tokens (input and output identical, a common pattern for open-weights inference providers).
How tokens are counted here
Llama uses a SentencePiece-based BPE tokenizer. We approximate in your browser using a family-tuned heuristic, accurate within roughly ±3% of the reference tokenizer for typical English text. Marked ≈±3% in the results table.
For exact counts, run AutoTokenizer.from_pretrained("meta-llama/Llama-3.3-70B-Instruct") locally on your text.
Pricing notes
$0.88 input / $0.88 output per 1M (Together.ai indicative pricing, verify your actual provider).
Single-rate pricing means output-heavy workloads (long generation) cost the same per-token as input-heavy workloads (long RAG context). That's structurally different from OpenAI / Anthropic, where output typically costs 4-10× input.
For 1,000 input + 200 output: $0.00106 per call, $1,056 per 1M calls.
When to use Llama 3.3 70B
- Open-weights workloads where compliance / portability / fine-tuning rights matter.
- Replacing Llama 3.1 70B in stable production pipelines, drop-in with minor quality lift.
- Output-heavy workloads, single-rate pricing makes Llama 3.3 70B competitive with proprietary models where output dominates the bill.
When not to use it:
- Cost-sensitive cheap workloads, Llama 3 8B Instruct Lite at $0.10/$0.10 on Together is dramatically cheaper for routine tasks.
- Tasks where Claude Sonnet's instruction-following measurably wins, Sonnet at $3/$15 is more expensive but often the right call for nuanced workloads.
Common questions
Llama 3.3 70B vs Llama 3.1 70B?
3.3 is Together's current advertised flagship Meta model. 3.1 70B is no longer prominently listed on Together's pricing page (kept in this catalog for SEO and for users who still call it on other providers). For new work, default to 3.3.
Llama 3.3 vs Qwen 3 Coder 480B for code?
Qwen 3 Coder 480B ($2/$2 Together) is specifically tuned for code generation, outperforms Llama 3.3 on coding benchmarks but is more expensive. For general-purpose work, Llama 3.3 is the better default.
Where's Llama 4?
Not currently on Together's published pricing page as of April 2026. We don't ship Llama 4 entries to avoid pretending it's available where it isn't. If your provider offers Llama 4, the tokenizer is similar enough to 3.x that this counter's estimate will be close, but verify.
Compare Llama 3.3 70B to other models
- Llama 3.1 405B (Meta, $3.50/$3.50)
- Llama 3.1 70B (Meta, $0.59/$0.79)
- Llama 3.1 8B (Meta, $0.18/$0.18)
- Qwen 2.5 72B (Alibaba, $0.90/$0.90)
- Qwen 2.5 Coder 32B (Alibaba, $0.80/$0.80)
- Claude Haiku 4.5 (Anthropic, $1.00/$5.00)