How much does GLM-4.7 Flash cost?

GLM-4.7 Flash costs $0.5/1M tokens for input and $2/1M tokens for output on CallMissed. 1 credit = ₹1 = $0.01 USD.

How do I use GLM-4.7 Flash via API?

Send a POST request to POST /v1/chat/completions with model "glm-4.7-flash" and your API key. CallMissed uses the OpenAI-compatible format — just change the base URL and model field.

What is the context window of GLM-4.7 Flash?

GLM-4.7 Flash supports a 128K token context window with up to 8K output tokens.

Back to all models

LLM Chatfast

GLM-4.7 Flash

by Z.ai · Released 2025

Zhipu AI's (Z.ai) fast inference model from the GLM-4 family. Built on the General Language Model architecture with strong bilingual (Chinese/English) capabilities. Optimized for speed with reliable tool-call instructions and concise responses.

LLM Chat

GLM-4.7 Flash

Context Window

128K

Parameters

32B

Max Output

Overview

GLM-4.7 Flash is Zhipu AI's (Z.ai) fast inference model from the GLM-4 family, featuring a 32B parameter dense architecture optimized for speed and reliability. Built on the General Language Model (GLM) architecture, it delivers strong bilingual Chinese/English performance with particularly reliable tool-calling and function-calling capabilities.

The model is designed for production workloads that require fast, concise responses with reliable structured output. Its tool-calling implementation is notably robust — it follows function-calling instructions precisely, making it an excellent choice for building AI agents that need to interact with external APIs and services. The 128K context window handles substantial documents and conversations.

GLM-4.7 Flash is open-source and available on HuggingFace, making it one of the strongest open bilingual models available. It is particularly well-suited for Chinese language applications, bilingual customer support, and any workflow that requires reliable tool use with fast inference. The model's concise response style makes it efficient for production deployments where token costs matter.

Pricing

Metric	Price
Input /1M tokens	₹50.0000
Output /1M tokens	₹200.0000

1 credit = ₹1 = $0.01 USD. Prices shown from provider; CallMissed passes through with ~35% markup.

Key Highlights

Fast inference with concise, natural responses
Strong bilingual Chinese/English performance
Reliable tool-call and function-calling support
Open-source model available on HuggingFace

Benchmarks

Benchmark	Score	Notes
C-Eval	82.3%	Chinese language evaluation
MMLU	78.5%	General knowledge
HumanEval	82.1%	Code generation
GSM-8K	88.7%	Math reasoning
Tool-Call Accuracy	94.2%	Function calling reliability

Technical Details

Architecture: General Language Model (GLM) with 32B dense parameters
Strong bilingual Chinese/English performance
Reliable tool-calling and function-calling implementation
Context window: 128K tokens
Open-source — available on HuggingFace under permissive license
Optimized for fast inference with concise response style
Available via Zhipu AI API and CallMissed unified gateway

Strengths

Best-in-class bilingual Chinese/English performance at 32B scale
Exceptionally reliable tool-calling and function-calling
Open-source on HuggingFace — can be self-hosted and fine-tuned
Fast inference with concise, efficient responses

Limitations

Primarily optimized for Chinese/English — weaker on other languages
32B dense model is less efficient than MoE architectures at similar quality
Smaller community and ecosystem compared to Llama or Qwen families

Use Cases

Bilingual applicationsTool-calling agentsFast code generationChinese language tasks

API Example

curl https://api.callmissed.com/v1/chat/completions \
  -H "Authorization: Bearer cm_YOUR_KEY" \
  -d '{"model": "glm-4.7-flash", "messages": [{"role": "user", "content": "Write a Python function to parse JSON"}]}'

Endpoint: POST /v1/chat/completions · Model ID: glm-4.7-flash

Try GLM-4.7 Flash now

Get 1000 free API credits on signup. No credit card required.

Start free Read docs