skip to content
gigarouter gigarouter
models / text generation · coming soon

GLM 5.2

unsloth/GLM-5.2-GGUF

published Jun 2026 · updated Jun 2026

GLM 5.2 is a text-generation model optimized for long-horizon tasks with a solid 1M-token context, advanced coding capabilities, and flexible thinking effort levels.

status
coming soon
API providers
0
downloads / mo
264.6K
license
mit

specs

TaskText Generation
ArchitectureMixture of Experts (MoE) with IndexShare sparse attention
Parameters744B total, 40B active
Context Length1,048,576 tokens
LicenseMIT

about this model

GLM-5.2 is a text-generation model designed for long-horizon tasks, supporting a solid 1M-token context with advanced reasoning and coding capabilities.

Key Strengths

  • 1M-token context that stably sustains long-horizon work.
  • IndexShare architecture reuses the same indexer across every four sparse attention layers, reducing per-token FLOPs by 2.9× at 1M context length.
  • Improved MTP layer for speculative decoding increases acceptance length by up to 20%.
  • Three thinking modes: Non-thinking, Thinking High, and Thinking Max, enabling flexible effort trade-offs.
  • Mixture-of-Experts with 744B total parameters and 40B active parameters, released under MIT license.

Benchmark Performance

GLM-5.2 achieves competitive scores across reasoning, coding, and agentic benchmarks. Notable results include HLE 40.5, HLE with Tools 54.7, AIME 2026 99.2, SWE-bench Pro 62.1, and Terminal Bench 2.1 81.0.

GLM-5.2 benchmark comparison chart

Quantization and Hardware

  • Dynamic 1-bit (UD-IQ1_M) reaches ~76.2% top-1 accuracy while being 86% smaller; dynamic 2-bit ~82% accuracy at 84% smaller; dynamic 4-bit and 5-bit are described as “mostly lossless.”
  • Quantized model sizes range from 223 GB (1-bit) to 810 GB (8-bit).
  • 2-bit quant fits on a 256 GB unified memory Mac; works with MoE offloading on 1×24 GB GPU + 256 GB RAM.

Full documentation and analysis and technical report are available.

best for

FAQ

What is the context length of GLM 5.2?

GLM 5.2 supports a solid 1,048,576-token context window.

What are the thinking modes available?

Three modes: Non-thinking, Thinking High, and Thinking Max.

What is the license for GLM 5.2?

It is released under the MIT open-source license with no regional limits.

How many parameters does GLM 5.2 have?

It has 744B total parameters with 40B active parameters (Mixture of Experts).

How can I call GLM 5.2 via the API?

Use the gigarouter OpenAI-compatible endpoint with your API key.

not yet live

We're benchmarking and onboarding GLM 5.2 as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related text generation models

compare all →