How to run Qwen 3.5 locally

Run the new Qwen3.5 LLMs including Medium: Qwen3.5-35B-A3B, 27B, 122B-A10B, Small: Qwen3.5-0.8B, 2B, 4B, 9B and 397B-A17B on your local device!

Qwen3.5 is Alibaba’s new model family, including Qwen3.5-35B-A3B, 27B, 122B-A10B and 397B-A17B and the new Small series: Qwen3.5-0.8B, 2B, 4B and 9B. The multimodal hybrid reasoning LLMs deliver the strongest performances for their sizes. They support 256K context across 201 languages, have thinking + non-thinking, and excel in agentic coding, vision, chat, and long-context tasks. The 35B and 27B models work on a 22GB Mac / RAM device. See all .

circle-check

All uploads use Unsloth for SOTA quantization performance - so 4-bit has important layers upcasted to 8 or 16-bit. Thank you Qwen for providing Unsloth with day zero access. You can also with Unsloth.

circle-info To enable or disable thinking see .Qwen3.5 Small models disables by default. Also see to enable Think toggle.

35B-A3B27B122B-A10B397B-A17BFine-tune Qwen3.50.8B • 2B • 4B • 9B

hashtag ⚙️ Usage Guide

Table: Inference hardware requirements (units = total memory: RAM + VRAM, or unified memory)

Qwen3.5 3-bit 4-bit 6-bit 8-bit BF16 + 3 GB 3.5 GB 5 GB 7.5 GB 9 GB 4.5 GB 5.5 GB 7 GB 10 GB 14 GB 5.5 GB 6.5 GB 9 GB 13 GB 19 GB 14 GB 17 GB 24 GB 30 GB 54 GB 17 GB 22 GB 30 GB 38 GB 70 GB 60 GB 70 GB 106 GB 132 GB 245 GB 180 GB 214 GB 340 GB 512 GB 810 GB

circle-check

... continue reading