GLM-5.2 is Z.ai’s new open model, delivering SOTA performance across long-horizon coding, reasoning, and agentic tasks. With 744B parameters, 40B active parameters, and a 1M context window, it can now be run locally using GGUFs. GLM-5.2 is the strongest open model to date, performing on par with Claude 4.8 Opus, GPT-5.5, and Gemini 3.1 Pro across Artificial Analysis and many other benchmarks.
top-1 accuracy while being 86% smaller. Dynamic 2-bit reaches ~82% accuracy while being 84% smaller. In other words, the model is not 86% worse despite being 86% smaller; it is only ~24% less accurate than the full 1.5TB model. Thanks Z.ai for giving Unsloth day-zero access.
Run GLM-5.2 TutorialsQuantization Results
⚙️ Usage Guide
The 2-bit dynamic quant UD-IQ2_M uses 239GB of disk space - this can directly fit on a 256GB unified memory Mac and works well in a 1x24GB GPU and 256GB of RAM with MoE offloading. The 1-bit quant will fit on a 223GB RAM and 8-bit requires 810GB RAM.
Table: Inference hardware requirements (units = total memory: RAM + VRAM, or unified memory)
1-bit 2-bit 3-bit 4-bit 5-bit 8-bit 223 GB 245 GB 290-360 GB 372-475 GB 570 GB 810 GB
Recommended Settings
GLM-5.2 has 3 thinking modes. Non-thinking and Thinking in two modes: High + Max. Use Max Thinking for complicated tasks. In you can easily toggle High + Max Thinking and non-Thinking with a UI.
Use these settings for most use cases:
... continue reading