GLM-4.7-Flash

👋 Join our Discord community.

📖 Check out the GLM-4.7 technical blog, technical report(GLM-4.5).

📍 Use GLM-4.7-Flash API services on Z.ai API Platform.

👉 One click to GLM-4.7.

Introduction

GLM-4.7-Flash is a 30B-A3B MoE model. As the strongest model in the 30B class, GLM-4.7-Flash offers a new option for lightweight deployment that balances performance and efficiency.

Performances on Benchmarks

Benchmark GLM-4.7-Flash Qwen3-30B-A3B-Thinking-2507 GPT-OSS-20B AIME 25 91.6 85.0 91.7 GPQA 75.2 73.4 71.5 LCB v6 64.0 66.0 61.0 HLE 14.4 9.8 10.9 SWE-bench Verified 59.2 22.0 34.0 τ²-Bench 79.5 49.0 47.7 BrowseComp 42.8 2.29 28.3

Serve GLM-4.7-Flash Locally

For local deployment, GLM-4.7-Flash supports inference frameworks including vLLM and SGLang. Comprehensive deployment instructions are available in the official Github repository.

... continue reading