Find Related products on Amazon

Shop on Amazon

Show HN: AutoThink – Boosts local LLM performance with adaptive reasoning

Published on: 2025-06-18 13:39:11

I built AutoThink, a technique that makes local LLMs reason more efficiently by adaptively allocating computational resources based on query complexity. The core idea: instead of giving every query the same "thinking time," classify queries as HIGH or LOW complexity and allocate thinking tokens accordingly. Complex reasoning gets 70-90% of tokens, simple queries get 20-40%. I also implemented steering vectors derived from Pivotal Token Search (originally from Microsoft's Phi-4 paper) that guide the model's reasoning patterns during generation. These vectors encourage behaviors like numerical accuracy, self-correction, and thorough exploration. Results on DeepSeek-R1-Distill-Qwen-1.5B: - GPQA-Diamond: 31.06% vs 21.72% baseline (+43% relative improvement) - MMLU-Pro: 26.38% vs 25.58% baseline - Uses fewer tokens than baseline approaches Works with any local reasoning model - DeepSeek, Qwen, custom fine-tuned models. No API dependencies. The technique builds on two things I develo ... Read full article.