GLM 5.2

unsloth/GLM-5.2-GGUF

published Jun 2026 · updated Jun 2026

GLM 5.2 is a text-generation model optimized for long-horizon tasks with a solid 1M-token context, advanced coding capabilities, and flexible thinking effort levels.

status

coming soon

API providers

downloads / mo

264.6K

license

mit

specs

Task	Text Generation
Architecture	Mixture of Experts (MoE) with IndexShare sparse attention
Parameters	744B total, 40B active
Context Length	1,048,576 tokens
License	MIT

about this model

GLM-5.2 is a text-generation model designed for long-horizon tasks, supporting a solid 1M-token context with advanced reasoning and coding capabilities.

Key Strengths

1M-token context that stably sustains long-horizon work.
IndexShare architecture reuses the same indexer across every four sparse attention layers, reducing per-token FLOPs by 2.9× at 1M context length.
Improved MTP layer for speculative decoding increases acceptance length by up to 20%.
Three thinking modes: Non-thinking, Thinking High, and Thinking Max, enabling flexible effort trade-offs.
Mixture-of-Experts with 744B total parameters and 40B active parameters, released under MIT license.

Benchmark Performance

GLM-5.2 achieves competitive scores across reasoning, coding, and agentic benchmarks. Notable results include HLE 40.5, HLE with Tools 54.7, AIME 2026 99.2, SWE-bench Pro 62.1, and Terminal Bench 2.1 81.0.

Quantization and Hardware

Dynamic 1-bit (UD-IQ1_M) reaches ~76.2% top-1 accuracy while being 86% smaller; dynamic 2-bit ~82% accuracy at 84% smaller; dynamic 4-bit and 5-bit are described as “mostly lossless.”
Quantized model sizes range from 223 GB (1-bit) to 810 GB (8-bit).
2-bit quant fits on a 256 GB unified memory Mac; works with MoE offloading on 1×24 GB GPU + 256 GB RAM.

Full documentation and analysis and technical report are available.

best for

·Long-horizon coding and software engineering tasks
·Agentic workflows requiring 1M-token context
·Complex reasoning and math problem solving

FAQ

What is the context length of GLM 5.2?

GLM 5.2 supports a solid 1,048,576-token context window.

What are the thinking modes available?

Three modes: Non-thinking, Thinking High, and Thinking Max.

What is the license for GLM 5.2?

It is released under the MIT open-source license with no regional limits.

How many parameters does GLM 5.2 have?

It has 744B total parameters with 40B active parameters (Mixture of Experts).

How can I call GLM 5.2 via the API?

Use the gigarouter OpenAI-compatible endpoint with your API key.

not yet live

We're benchmarking and onboarding GLM 5.2 as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related text generation models

tiny-Qwen2ForCausalLM-2.5

9.2M dl/mo

deepseek-v4-gguf

6.4M dl/mo

Qwen3.6-35B-A3B-NVFP4

6.2M dl/mo

gemma-3-270m

5.1M dl/mo