With model devs pushing more aggressive rate limits, raising prices, or even abandoning subscriptions for usage-based pricing, that vibe-coded hobby project is about to get a whole lot more expensive. Fortunately, you're not without cost-saving options.
Over the past few weeks, we've seen Anthropic toy with dropping Claude Code from its most affordable plans while Microsoft has skipped testing the waters and moved GitHub Copilot to a purely usage-based model. The whole debacle got us thinking. Do we even need Anthropic or OpenAI's top models, or can we get away with a smaller local model? Sure, it might be slower, less capable, and a little more frustrating to work with, but you can't beat the price of free... Well, assuming you've already got the hardware that is.
It just so happens that Alibaba recently dropped Qwen3.6-27B, which the cloud and e-commerce giant boasts packs "flagship coding power" into a package small enough to run on a 32 GB M-series Mac or 24 GB GPU.
What's changed
This isn't the first time we've looked at local code assistants. Previously we explored using Continue's VS Code extension for tasks such as code completion and generation.
At the time, the models and software stack were quite immature, making them useful tools, but not necessarily good enough to compete with larger frontier models. Since then, model architectures and agent harnesses have improved dramatically.
"Reasoning" capabilities allow small models to make up for their size by "thinking" for longer, mixture-of-experts models mean you don’t need terabytes a second of memory bandwidth for an interactive experience, and vastly improved function and tool calling capabilities mean that these models can actually interact with code bases, shell environments, and the web.
All vibes, no rate limits
In this hands on, we'll be looking at how to deploy and configure local models like Qwen3.6-27B, for coding on your computer, and explore some of the agent frameworks you can use with them.
What you'll need:
... continue reading