Claude Code: connect to a local model when your quota runs out

If you’re on one of the cheaper Anthropic plans like me, it’s a pretty common scenario when you’re deep into Claude coding an idea, to hit a daily or weekly quota limit. If you want to keep going, you can connect to a local open source model instead of Anthropic. To monitor your current quota, type: /usage

Type /usage to monitor how much quota you have left and how quick you burn it.

The best open source model is changing pretty frequently, but at the time of writing this post, I recommend GLM-4.7-Flash from Z.AI or Qwen3-Coder-Next. If you want or need to save some disk space and GPU memory, try a smaller quantized version which will load and run quicker with a quality cost. I’ll save another detailed post for how to find the best open source model for your task and machine constraints.

Method 1: LM Studio

Accessing open source models in LM Studio

If you haven’t used LM Studio before, it’s an accessible way to find and run open source LLMs and vision models locally on your machine. In version 0.4.1, they introduce support to connect to Claude Code (CC). See here: https://lmstudio.ai/blog/claudecode or follow the instructions below:

Install and run LM Studio Find the model search button to install a model (see image above). LM Studio recommends running the model with a context of > 25K. Open a new terminal session to:

a. start the server: lms server start --port 1234

b. configure environment variables to point CC at LM Studio:

export ANTHROPIC_BASE_URL=http://localhost:1234

... continue reading