Unsloth GLM-5.2 – How to Run Locally

GLM-5.2 is Z.ai’s new open model, delivering SOTA performance across long-horizon coding, reasoning, and agentic tasks. With 744B parameters, 40B active parameters, and a 1M context window, it can now be run locally using GGUFs. GLM-5.2 is the strongest open model to date, performing on par with Claude 4.8 Opus, GPT-5.5, and Gemini 3.1 Pro across Artificial Analysis and many other benchmarks.

The full model requires 1.51TB of disk space, while Unsloth Dynamic 2-bit GGUF reduces this to 239GB (-84% size) by upcasting important layers to 8 or 16-bit. Dynamic 1-bit lowers further to 217GB (-86%). Thanks Z.ai for giving Unsloth day-zero access.

Run GLM-5.2 TutorialsQuantization Results

⚙️ Usage Guide

The 2-bit dynamic quant UD-IQ2_M uses 239GB of disk space - this can directly fit on a 256GB unified memory Mac and works well in a 1x24GB GPU and 256GB of RAM with MoE offloading. The 1-bit quant will fit on a 223GB RAM and 8-bit requires 810GB RAM.

Table: Inference hardware requirements (units = total memory: RAM + VRAM, or unified memory)

1-bit 2-bit 3-bit 4-bit 5-bit 8-bit 223 GB 245 GB 290-360 GB 372-475 GB 570 GB 810 GB

Recommended Settings

GLM-5.2 has 3 thinking modes. Non-thinking and Thinking in two modes: High + Max. Use Max Thinking for complicated tasks. In you can easily toggle High + Max Thinking and non-Thinking with a UI.

Use these settings for most use cases:

... continue reading