How do I count tokens for an AI prompt?

Q: How do I count tokens for an AI prompt?

The right way to count tokens depends on which model you're using. Here's how to get an exact count for OpenAI, Anthropic, Google, and open-source models.

The fastest way

Paste your prompt into the counter on this site. It computes exact counts for OpenAI, Anthropic, and Google models, and approximates within ±3% for open-source models — all in one view.

Below is what's happening under the hood for each provider, in case you want to count programmatically.

OpenAI (GPT-4o, GPT-4o mini, GPT-4 Turbo)

OpenAI publishes its tokenizer as the open-source tiktoken library. Two encodings cover all current models:

o200k_base — GPT-4o family
cl100k_base — GPT-4 Turbo, GPT-3.5 Turbo

Python:

import tiktoken
enc = tiktoken.get_encoding("o200k_base")
tokens = enc.encode("your prompt here")
print(len(tokens))

JavaScript: use js-tiktoken (pure JS, browser-safe) or @dqbd/tiktoken (WASM, faster but heavier).

Anthropic (Claude Opus, Sonnet, Haiku)

Anthropic does not publish its tokenizer. The official way to count tokens is the API endpoint:

curl https://api.anthropic.com/v1/messages/count_tokens \
  -H "x-api-key: $ANTHROPIC_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -H "content-type: application/json" \
  -d '{
    "model": "claude-sonnet-4-6",
    "messages": [{"role": "user", "content": "your prompt here"}]
  }'

Returns {"input_tokens": <number>}. The endpoint is free, separate from generation billing, and is the only authoritative source for Claude token counts.

Google (Gemini 2.5 Pro, Flash)

Google exposes a models.countTokens endpoint:

curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:countTokens?key=$GEMINI_API_KEY" \
  -H "content-type: application/json" \
  -d '{"contents":[{"parts":[{"text":"your prompt here"}]}]}'

Returns {"totalTokens": <number>}. Free.

Open-source (Llama, Mistral, DeepSeek, Qwen)

Each open-weights model ships its tokenizer on Hugging Face:

from transformers import AutoTokenizer
tok = AutoTokenizer.from_pretrained("meta-llama/Llama-3.1-70B-Instruct")
print(len(tok.encode("your prompt here")))

This is the reference count. Browser-side approximations (used in this counter, marked ≈±3%) are typically within a few percent for English prose; less accurate for code or non-English text.

Why this matters

Token count drives cost, latency, and context-window utilization. Estimating ahead of time lets you:

Predict per-call cost before sending a single API request
Catch prompts that will exceed a model's context window
Compare cost across models on the same input
Audit cost overruns by re-tokenizing past requests

Paste your prompt above to see all four counts in one view.

Try this on every model

Claude Opus 4.7 $15.00/$75.00
Claude Sonnet 4.6 $3.00/$15.00
Claude Haiku 4.5 $0.80/$4.00
GPT-4o $2.50/$10.00
GPT-4o mini $0.15/$0.60
GPT-4 Turbo $10.00/$30.00
Gemini 2.5 Pro $1.25/$10.00
Gemini 2.5 Flash $0.07/$0.30
Llama 3.1 405B $3.50/$3.50
Llama 3.1 70B $0.59/$0.79
Llama 3.1 8B $0.18/$0.18
Mistral Large $2.00/$6.00
DeepSeek V3 $0.27/$1.10
Qwen 2.5 72B $0.90/$0.90
Qwen 2.5 Coder 32B $0.80/$0.80

Try the live counter →