It ran for ~5 hours and got 1 tok/s. Another ~3 hours of optimizing and it's at 4.74 tok/s using 5.9GB RAM
Mar 17, 2026 · 4:16 PM UTC
The development of Qwen3.5-397B achieving 4.74 tokens per second with minimal RAM highlights significant advancements in AI model efficiency and performance. These improvements can lead to more accessible and cost-effective AI solutions for both industry applications and consumers. Continued optimization of such models promises to enhance real-time AI capabilities across various sectors.
It ran for ~5 hours and got 1 tok/s. Another ~3 hours of optimizing and it's at 4.74 tok/s using 5.9GB RAM
Mar 17, 2026 · 4:16 PM UTC