How Many Tokens?

← Back to counter

How do I count tokens for an AI prompt?

The fastest way

Paste your prompt into the counter on this site. It computes exact counts for OpenAI, Anthropic, and Google models, and approximates within ±3% for open-source models — all in one view.

Below is what's happening under the hood for each provider, in case you want to count programmatically.

OpenAI (GPT-4o, GPT-4o mini, GPT-4 Turbo)

OpenAI publishes its tokenizer as the open-source tiktoken library. Two encodings cover all current models:

Python:

import tiktoken
enc = tiktoken.get_encoding("o200k_base")
tokens = enc.encode("your prompt here")
print(len(tokens))

JavaScript: use js-tiktoken (pure JS, browser-safe) or @dqbd/tiktoken (WASM, faster but heavier).

Anthropic (Claude Opus, Sonnet, Haiku)

Anthropic does not publish its tokenizer. The official way to count tokens is the API endpoint:

curl https://api.anthropic.com/v1/messages/count_tokens \
  -H "x-api-key: $ANTHROPIC_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -H "content-type: application/json" \
  -d '{
    "model": "claude-sonnet-4-6",
    "messages": [{"role": "user", "content": "your prompt here"}]
  }'

Returns {"input_tokens": <number>}. The endpoint is free, separate from generation billing, and is the only authoritative source for Claude token counts.

Google (Gemini 2.5 Pro, Flash)

Google exposes a models.countTokens endpoint:

curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:countTokens?key=$GEMINI_API_KEY" \
  -H "content-type: application/json" \
  -d '{"contents":[{"parts":[{"text":"your prompt here"}]}]}'

Returns {"totalTokens": <number>}. Free.

Open-source (Llama, Mistral, DeepSeek, Qwen)

Each open-weights model ships its tokenizer on Hugging Face:

from transformers import AutoTokenizer
tok = AutoTokenizer.from_pretrained("meta-llama/Llama-3.1-70B-Instruct")
print(len(tok.encode("your prompt here")))

This is the reference count. Browser-side approximations (used in this counter, marked ≈±3%) are typically within a few percent for English prose; less accurate for code or non-English text.

Why this matters

Token count drives cost, latency, and context-window utilization. Estimating ahead of time lets you:

Paste your prompt above to see all four counts in one view.

Try this on every model

Try the live counter →