GLM 5.2
unsloth/GLM-5.2-GGUF
published Jun 2026 · updated Jun 2026
GLM 5.2 is a text-generation model optimized for long-horizon tasks with a solid 1M-token context, advanced coding capabilities, and flexible thinking effort levels.
specs
| Task | Text Generation |
| Architecture | Mixture of Experts (MoE) with IndexShare sparse attention |
| Parameters | 744B total, 40B active |
| Context Length | 1,048,576 tokens |
| License | MIT |
about this model
GLM-5.2 is a text-generation model designed for long-horizon tasks, supporting a solid 1M-token context with advanced reasoning and coding capabilities.
Key Strengths
- 1M-token context that stably sustains long-horizon work.
- IndexShare architecture reuses the same indexer across every four sparse attention layers, reducing per-token FLOPs by 2.9× at 1M context length.
- Improved MTP layer for speculative decoding increases acceptance length by up to 20%.
- Three thinking modes: Non-thinking, Thinking High, and Thinking Max, enabling flexible effort trade-offs.
- Mixture-of-Experts with 744B total parameters and 40B active parameters, released under MIT license.
Benchmark Performance
GLM-5.2 achieves competitive scores across reasoning, coding, and agentic benchmarks. Notable results include HLE 40.5, HLE with Tools 54.7, AIME 2026 99.2, SWE-bench Pro 62.1, and Terminal Bench 2.1 81.0.
Quantization and Hardware
- Dynamic 1-bit (UD-IQ1_M) reaches ~76.2% top-1 accuracy while being 86% smaller; dynamic 2-bit ~82% accuracy at 84% smaller; dynamic 4-bit and 5-bit are described as “mostly lossless.”
- Quantized model sizes range from 223 GB (1-bit) to 810 GB (8-bit).
- 2-bit quant fits on a 256 GB unified memory Mac; works with MoE offloading on 1×24 GB GPU + 256 GB RAM.
Full documentation and analysis and technical report are available.
best for
- ·Long-horizon coding and software engineering tasks
- ·Agentic workflows requiring 1M-token context
- ·Complex reasoning and math problem solving
FAQ
GLM 5.2 supports a solid 1,048,576-token context window.
Three modes: Non-thinking, Thinking High, and Thinking Max.
It is released under the MIT open-source license with no regional limits.
It has 744B total parameters with 40B active parameters (Mixture of Experts).
Use the gigarouter OpenAI-compatible endpoint with your API key.
We're benchmarking and onboarding GLM 5.2 as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.