Stop Using Ollama

Ollama is the most popular way to run local LLMs. It shouldn’t be. It gained that position by being first, the first tool that made llama.cpp accessible to people who didn’t want to compile C++ or write their own server configs. That was a real contribution, briefly. But the project has since spent years systematically obscuring where its actual technology comes from, misleading users about what they’re running, and drifting from the local-first mission that earned it trust in the first place. All while taking venture capital money.

This isn’t a “both sides” piece. I’ve used Ollama. I’ve moved on. Here’s why you should too.

A llama.cpp Wrapper With Amnesia

Ollama’s entire inference capability comes from llama.cpp, the C++ inference engine created by Georgi Gerganov in March 2023. Gerganov’s project is what made it possible to run LLaMA models on consumer laptops at all, he hacked together the first version in an evening, and it kicked off the entire local LLM movement. Today llama.cpp has over 100,000 stars on GitHub, 450+ contributors, and is the foundation that nearly every GGUF-based tool depends on.

Ollama was founded in 2021 by Jeffrey Morgan and Michael Chiang, both previously behind Kitematic, a Docker GUI that was acquired by Docker Inc. They went through Y Combinator’s Winter 2021 batch, raised pre-seed funding, and launched publicly in 2023. From day one, the pitch was “Docker for LLMs”, a convenient wrapper that downloads and runs models with a single command. Under the hood, it was llama.cpp doing all the work.

For over a year, Ollama’s README contained no mention of llama.cpp. Not in the README, not on the website, not in their marketing materials. The project’s binary distributions didn’t include the required MIT license notice for the llama.cpp code they were shipping. This isn’t a matter of open-source etiquette, the MIT license has exactly one major requirement: include the copyright notice. Ollama didn’t.

The community noticed. GitHub issue #3185 was opened in early 2024 requesting license compliance. It went over 400 days without a response from maintainers. When issue #3697 was opened in April 2024 specifically requesting llama.cpp acknowledgment, community PR #3700 followed within hours. Ollama’s co-founder Michael Chiang eventually added a single line to the bottom of the README: “llama.cpp project founded by Georgi Gerganov.”

The response to the PR was revealing. Ollama’s team wrote: “We spend a large chunk of time fixing and patching it up to ensure a smooth experience for Ollama users… Overtime, we will be transitioning to more systematically built engines.” Translation: we’re not going to give llama.cpp prominent credit, and we plan to distance ourselves from it anyway.

As one Hacker News commenter put it: “I’m continually puzzled by their approach, it’s such self-inflicted negative PR. Building on llama is perfectly valid and they’re adding value on ease of use here. Just give the llama team proper credit.” Another: “The fact that Ollama has been downplaying their reliance on llama.cpp has been known in the local LLM community for a long time.”

The Fork That Made Things Worse

... continue reading