GLM-5.1: token counter & pricing

zhipu · approximate · pricing as of 2026-07-26.

Updated 2026-07-26 · By Clinton Patrick · Methodology

Provider: zhipu
API model ID: zai-org/GLM-5.1-Air
Context window: 128,000 tokens
Input price: $1.40 per 1M tokens
Output price: $4.40 per 1M tokens
Tokenizer accuracy: approximate
Pricing as of: 2026-07-26

Open the counter to count tokens for GLM-5.1 in real time.

What is GLM-5.1?

GLM-5.1 is Zhipu AI's current flagship, the latest in the ChatGLM family, designed to be competitive with frontier-class models on multilingual workloads (especially Chinese). $1.40 input / $4.40 output per 1M tokens via Together.ai.

Strong on Chinese-language workloads where US-trained models often underperform. Worth evaluating if your audience or content is primarily CJK.

How tokens are counted here

GLM models use a custom BPE tokenizer designed for efficient Chinese / English handling. We currently approximate using the Llama-family BPE as a proxy, marked ≈±10% (looser than the ≈±3% confidence we ship for Llama itself, we'll tighten this once we ship the real ChatGLM tokenizer JSON).

For exact counts, use Zhipu's official tokenizer via Hugging Face: AutoTokenizer.from_pretrained("zai-org/GLM-5.1-Air").

Pricing notes

$1.40 input / $4.40 output per 1M (Together.ai indicative).

For 1,000 input + 200 output: $0.00228 per call, $2,280 per 1M calls.

128K context window.

When to use GLM-5.1

Chinese-language workloads, significantly better tokenization efficiency than GPT or Claude on Chinese text (~2× tokens per Chinese character vs Latin alphabet).
Multilingual customer-facing applications in greater China region.
Differentiation from US-centric model families where you want a Chinese-trained perspective.

When not to use it:

Pure English workloads, Llama 3.3 70B ($0.88/$0.88) or GPT-5.2 ($1.75/$14) are better-known and competitively priced.
Production with regulatory concerns about cross-border data flow to Chinese-affiliated providers.
Workloads requiring the most mature function-calling and structured outputs.

Common questions

How does GLM-5.1 compare to Qwen for Chinese text?

Qwen 2.5 72B: $0.90/$0.90, substantially cheaper. Both are Chinese-trained with strong CJK tokenization. Run your own evals for the specific Chinese task; performance varies by domain.

Why is the tokenizer accuracy ≈±10% instead of ≈±3%?

We don't yet ship the real ChatGLM tokenizer JSON, we currently approximate using Llama BPE as a proxy because they share similar vocabulary structure. Real ChatGLM tokenizer support is planned; once it ships, this label will update to ≈±3%.

Self-hosting GLM-5.1?

GLM-5.1 Air (the variant Together hosts) is the lighter member of the GLM-5 line, runs on more accessible hardware than full GLM-5. The full GLM-5 is positioned for larger-scale enterprise deployments.

Compare GLM-5.1 to other models

GPT-5.1 (OpenAI, $1.25/$10.00)
GPT-5 (OpenAI, $1.25/$10.00)
Gemini 2.5 Pro (Google, $1.25/$10.00)