GLM-5.1: token counter & pricing
zhipu · approximate · pricing as of 2026-05-31.
- Provider
- zhipu
- API model ID
zai-org/GLM-5.1-Air- Context window
- 128,000 tokens
- Input price
- $1.40 per 1M tokens
- Output price
- $4.40 per 1M tokens
- Tokenizer accuracy
- approximate
- Pricing as of
- 2026-05-31
Open the counter to count tokens for GLM-5.1 in real time.
What is GLM-5.1?
GLM-5.1 is Zhipu AI's current flagship, the latest in the ChatGLM family, designed to be competitive with frontier-class models on multilingual workloads (especially Chinese). $1.40 input / $4.40 output per 1M tokens via Together.ai.
Strong on Chinese-language workloads where US-trained models often underperform. Worth evaluating if your audience or content is primarily CJK.
How tokens are counted here
GLM models use a custom BPE tokenizer designed for efficient Chinese / English handling. We currently approximate using the Llama-family BPE as a proxy, marked ≈±10% (looser than the ≈±3% confidence we ship for Llama itself, we'll tighten this once we ship the real ChatGLM tokenizer JSON).
For exact counts, use Zhipu's official tokenizer via Hugging Face: AutoTokenizer.from_pretrained("zai-org/GLM-5.1-Air").
Pricing notes
$1.40 input / $4.40 output per 1M (Together.ai indicative).
For 1,000 input + 200 output: $0.00228 per call, $2,280 per 1M calls.
128K context window.
When to use GLM-5.1
- Chinese-language workloads, significantly better tokenization efficiency than GPT or Claude on Chinese text (~2× tokens per Chinese character vs Latin alphabet).
- Multilingual customer-facing applications in greater China region.
- Differentiation from US-centric model families where you want a Chinese-trained perspective.
When not to use it:
- Pure English workloads, Llama 3.3 70B ($0.88/$0.88) or GPT-5.2 ($1.75/$14) are better-known and competitively priced.
- Production with regulatory concerns about cross-border data flow to Chinese-affiliated providers.
- Workloads requiring the most mature function-calling and structured outputs.
Common questions
How does GLM-5.1 compare to Qwen for Chinese text?
Qwen 2.5 72B: $0.90/$0.90, substantially cheaper. Both are Chinese-trained with strong CJK tokenization. Run your own evals for the specific Chinese task; performance varies by domain.
Why is the tokenizer accuracy ≈±10% instead of ≈±3%?
We don't yet ship the real ChatGLM tokenizer JSON, we currently approximate using Llama BPE as a proxy because they share similar vocabulary structure. Real ChatGLM tokenizer support is planned; once it ships, this label will update to ≈±3%.
Self-hosting GLM-5.1?
GLM-5.1 Air (the variant Together hosts) is the lighter member of the GLM-5 line, runs on more accessible hardware than full GLM-5. The full GLM-5 is positioned for larger-scale enterprise deployments.
Compare GLM-5.1 to other models
- GPT-5.1 (OpenAI, $1.25/$10.00)
- GPT-5 (OpenAI, $1.25/$10.00)
- Gemini 2.5 Pro (Google, $1.25/$10.00)