How to run a local AI chatbot on your iPhone

For a lot of people, the most appealing reason to use a local chatbot will be the amount of money you can save. Right now, running a local model on your iPhone involves, at most, a one-time purchase of $5.

Compare that to a subscription from any of the big AI labs. For instance, if you want to use ChatGPT without ads, you'll need to spend at least $20 per month on OpenAI's Plus plan. You could get away with the more affordable Go tier or even stick with the free offering if you plan to use ChatGPT only sporadically, but then you also need to consider rate limits. Similarly, Google AI plans start at $8 per month, but you could spend as much $100 every month on its Ultra subscription. When you run an AI chatbot off your iPhone, you can use it as much as you want. As a power user, you're very likely to hit your daily usage limit with ChatGPT, Claude or Gemini if you don't pony up.

For the privacy-minded, local chatbots offer another advantage. None of the options I'll be recommending in this article require a login or for you to share your data with the labs that trained the models you want to run. The app developers also say they don't collect any usage information. With proprietary models, you should assume your prompts, and any information, images, audio or video you share will be used to train future models. There are rare exceptions. Proton's Lumo chatbot, for example, is fully private by default. For most chatbots, including ChatGPT, you'll need to do some digging to opt out of sharing your data for model training.

Something you also can't do with ChatGPT, Claude or Gemini is use them without an internet connection, whereas local chatbots can run even if you're offline.

That said, there are a few drawbacks worth noting. As capable as the latest open-weight models are, they're not as sophisticated as the latest proprietary models from Anthropic, OpenAI and other for-profit AI labs. For instance, closed models, due to the powerful cloud hardware powering them, tend to offer longer context windows that allow them to reference information from past chats. In practice, that translates to chatbots that feel more intelligent and conversational, since you won't need to repeat yourself often, if ever.

What's more, both ChatGPT and Claude offer robust "memory" features that allow them to personalize their responses to each user. My version of ChatGPT knows my main axe is a 1993 Fender Stratocaster, and will frequently reference that fact when I ask it guitar-related questions. For some people, this is something that can make using a chatbot addictive, since it feels like the system wants to know them.

If you need a chatbot that can provide timely information, a local model probably won't cut it. All LLMs have a knowledge cutoff. That's the point in time beyond which their training data doesn't cover. In the case of GPT-5.5 Instant, for example, it won't be able to reference events past August 2024. For Llama 3.2, meanwhile, that date is December 2023.

To answer questions beyond its knowledge cutoff, a model will ideally turn to a robust web search tool. Proprietary models offer two advantages as it relates to timeliness. First, the current pace at which companies like OpenAI are releasing new models means those systems inherently incorporate more recent data since they're newer. Moreover, since you need an internet connection to use ChatGPT, Claude or Gemini, those chatbots can easily search the web to augment their answers. Open source models can use web search tools, but not without third-party extensions.