How to set up and run OpenAI’s ‘gpt-oss-20b’ open weight model locally on your Mac

This week, OpenAI released its long-awaited open weight model called gpt-oss. Part of the appeal of gpt-oss is that you can run it locally on your own hardware, including Macs with Apple silicon. Here’s how to get started and what to expect.

Models and Macs

First, gpt-oss comes in two flavors: gpt-oss-20b and gpt-oss-120b. The former is described as a medium open weight model, while the latter is considered a heavy open weight model.

The medium model is what Apple silicon Macs with enough resources can expect to run locally. The difference? Expect the smaller model to hallucinate more compared to the much larger model due to the data set size difference. That’s the tradeoff for an otherwise faster model that’s actually capable of running on high end Macs.

Still, the smaller model is a neat tool that’s freely available if you have a Mac with enough resources and a curiosity about running large language models locally.

You should also be aware of differences with running a local model compared to, say, ChatGPT. By default, the open weight local model lacks a lot of the modern chatbot features that make ChatGPT useful. For example, responses do not contain consideration for web results that can often limit hallucinations.

OpenAI recommends at least 16GB RAM to run gpt-oss-20b, but Macs with more RAM will obviously perform better. Based on early user feedback, 16GB RAM is really the floor for what’s needed to just experiment. (AI is a big reason that Apple stopped selling Macs with 8GB RAM not that long ago — with one value exception.)

Setup and use

Preamble aside, actually getting started is super simple.

First, install Ollama on your Mac. This is basically the window for interfacing with gpt-oss-20b. You can find the app at ollama.com/download, or download the Mac version from this download link.

... continue reading