GLM 5.2
by Z.ai · Released 2026
Zhipu AI's (Z.ai) flagship agentic coding model from the GLM-5 family. A 262K-context model purpose-built for long-horizon software engineering — multi-turn tool calling, native reasoning, and reliable structured output across large codebases.
GLM 5.2
Powered by Z.ai · General Language Model (GLM), Mixture-of-Experts
Context Window
262K
Parameters
MoE
Max Output
8K
Category
LLM Chat
Overview
GLM 5.2 is Zhipu AI's (Z.ai) flagship agentic coding model, the most capable entry in the GLM-5 family. It pairs a very large 262,144-token context window with native reasoning and robust multi-turn function calling, making it well-suited for autonomous coding agents that plan changes across many files, call tools to read and edit code, run tests, and iterate — all while keeping the full project context in a single window.
The model is tuned for agentic coding workflows: it follows tool-calling instructions precisely, emits reliable structured output for tool payloads, and uses a `reasoning_effort` thinking toggle (low/medium/high) to trade latency for depth on harder problems. Its bilingual Chinese/English heritage from the GLM family carries through, so it remains strong on multilingual technical content.
On CallMissed, GLM 5.2 is fully OpenAI-compatible on `/v1/chat/completions` with streaming, tool calling, and the reasoning toggle. The 262K context handles large repositories, long design documents, and extended agent transcripts in one pass — pair it with a planner/executor loop for repository-scale refactors, or use it directly for complex single-shot coding and analysis tasks.
Pricing
| Metric | Price |
|---|---|
| Input /1M tokens | ₹189.0000 |
| Output /1M tokens | ₹594.0000 |
1 credit = ₹1 = $0.01 USD. Prices shown from provider; CallMissed passes through with ~35% markup.
Key Highlights
- Flagship agentic coding model — built for long-horizon engineering
- 262K context for repository-scale tasks in a single pass
- Native reasoning with a low/medium/high effort toggle
- Reliable multi-turn tool calling and structured output
Benchmarks
| Benchmark | Score |
|---|---|
| Context | 262K |
| Tool Calling | Yes |
| Reasoning | Yes |
Technical Details
- Architecture: General Language Model (GLM) mixture-of-experts
- Context window: 262,144 tokens
- Native reasoning with reasoning_effort control
- Multi-turn + parallel tool/function calling
- OpenAI-compatible on the CallMissed gateway with streaming
- Bilingual Chinese/English strength from the GLM family
Strengths
- Purpose-built for agentic coding and long-horizon tool use
- Very large 262K context for whole-codebase reasoning
- Reasoning toggle balances latency vs depth per request
- Reliable structured output keeps tool-call loops stable
Limitations
- Premium pricing relative to the fast GLM 4.7 Flash tier
- Reasoning mode increases time-to-first-token on hard prompts
- Coding-optimized — general chat may prefer a cheaper model
Use Cases
API Example
curl https://api.callmissed.com/v1/chat/completions \
-H "Authorization: Bearer cm_YOUR_KEY" \
-d '{"model": "glm-5.2", "messages": [{"role": "user", "content": "Refactor this module and add tests"}]}'Endpoint: POST /v1/chat/completions · Model ID: glm-5.2