Skip to content
Tech News
← Back to articles

Unsloth GLM-5.2 – How to Run Locally

read original more articles
Why This Matters

The release of GLM-5.2 marks a significant advancement in open-source AI models, offering state-of-the-art performance comparable to proprietary models while enabling local deployment. Its efficient quantization techniques drastically reduce storage requirements, making high-performance AI more accessible to consumers and developers alike. This development could democratize AI usage, fostering innovation and broader adoption across the tech industry.

Key Takeaways

GLM-5.2 is Z.ai’s new open model, delivering SOTA performance across long-horizon coding, reasoning, and agentic tasks. With 744B parameters, 40B active parameters, and a 1M context window, it can now be run locally using GGUFs. GLM-5.2 is the strongest open model to date, performing on par with Claude 4.8 Opus, GPT-5.5, and Gemini 3.1 Pro across Artificial Analysis and many other benchmarks.

The full model requires 1.51TB of disk space, while Unsloth Dynamic 2-bit GGUF reduces this to 239GB (-84% size) by upcasting important layers to 8 or 16-bit. Dynamic 1-bit lowers further to 217GB (-86%). Thanks Z.ai for giving Unsloth day-zero access.

Run GLM-5.2 TutorialsQuantization Results

⚙️ Usage Guide

The 2-bit dynamic quant UD-IQ2_M uses 239GB of disk space - this can directly fit on a 256GB unified memory Mac and works well in a 1x24GB GPU and 256GB of RAM with MoE offloading. The 1-bit quant will fit on a 223GB RAM and 8-bit requires 810GB RAM.

Table: Inference hardware requirements (units = total memory: RAM + VRAM, or unified memory)

1-bit 2-bit 3-bit 4-bit 5-bit 8-bit 223 GB 245 GB 290-360 GB 372-475 GB 570 GB 810 GB

Recommended Settings

GLM-5.2 has 3 thinking modes. Non-thinking and Thinking in two modes: High + Max. Use Max Thinking for complicated tasks. In you can easily toggle High + Max Thinking and non-Thinking with a UI.

Use these settings for most use cases:

... continue reading