Ollama adopts MLX for faster AI performance on Apple silicon Macs

One of the best tools to run AI models locally on a Mac just got even better. Here’s why, and how to run it.

Local AI models now run faster on Ollama on Apple silicon Macs

If you’re not familiar with Ollama, this is a Mac, Linux, and Windows app that lets users run AI models locally on their computers.

Contrary to cloud-based apps such as ChatGPT, whose models don’t run locally and require an internet connection, Ollama lets users load and run models directly on their machines.

These models can be downloaded from open-source communities such as Hugging Face, or even directly from the model provider, as we covered here.

However, running an LLM locally can be quite challenging, as even small and lightweight LLMs tend to gobble up substantial RAM and GPU memory.

To try to counter that, Ollama has released a preview version (Ollama 0.19) of its app that “is now built on top of Apple’s machine learning framework, MLX, to take advantage of its unified memory architecture,” making local AI models run faster on Apple silicon Macs.

Here’s Ollama:

This results in a large speedup of Ollama on all Apple Silicon devices. On Apple’s M5, M5 Pro and M5 Max chips, Ollama leverages the new GPU Neural Accelerators to accelerate both time to first token (TTFT) and generation speed (tokens per second).

With this update, Ollama says it is now faster to run personal assistants such as OpenClaw, as well as coding agents “like Claude Code, OpenCode, or Codex.”

... continue reading