Social Animus

May 28th, 2026 @

justine's web page

Social Animus

recent photo by recent photo by @thepsyence

The most difficult challenge to working in open source is that there's no institutional screening process, since the goal is to just let people organize themselves and build things. This has meant that many of the people who get involved have never had the opportunity to work with the most exemplary members of each group the world has to offer. During the culture wars of the 2010s, the first person who tried to solve the problem of how to include these uncommon individuals was Coraline Ada Ehmke, who wrote the Contributor Covenant. I always thought her solution went too far, since I found a much easier answer for my own project, which has been to never accept anonymous contributions and to not merge a single line of code until the contributor sends an email promising to assign me copyright.

I always thought my security posture was too paranoid, so when llama.cpp came out in 2023, I found the code Gerganov wrote to be so beautiful that I did the one thing that I promised myself I would never do, which was collaborate with an anonymous developer from his team named Slaren. This was the first time in five years that I wrote a change with someone on a project that wasn't my own. After submitting our work he went on 4chan afterwards and accused me plagiarism, saying that even my changes were his own. The way the community reacted is an interesting case study into the guile some developers have learned since the culture war, because the locus of thought for llama.cpp has always been on 4chan. They were the ones who originally leaked the Meta LLaMA v1 weights. You can map the way developers talk on that board to their anonymous accounts on GitHub. I actually developed migraines for the first time in my life and ended up in the hospital (since I didn't have health insurance and had to wait in the ER) due to the eye strain of reading unfiltered thoughts about me for months. It's unusual because the community originally reacted positively towards my work, until one of its members felt threatened by me, and since they're all anonymous there's not much proof it wasn't just a few guys. This was the reason Wendy Hanamura cited when she canceled my invitation to speak at the Internet Archive.

In any case, I'm really happy that these back channels exist, because the greatest competitive advantage I've ever had was to monitor which pull requests people on 4chan complained about, and then merge them into llamafile before Gerganov could. This is how my Mozilla Builders project shipped support for new models like Gemma 2 before any other grassroots project. I got hundreds of thousands of downloads on Hugging Face. There were so many downloads that Mozilla couldn't believe it, because so few people showed up on our issue tracker. Mozilla was sponsoring my work because they want to support the community, and as far as anyone could tell, there wasn't one. I always thought this happened because my code was just that good. In a past life, when I was originally trained to write kiosk software for reverse vending machines in Java, no one ever contacted the vendor unless there was something wrong, and since llamafile is an ex nihilo project that I worked on for six years, beginning with an empty file and an assembler, I had plenty of time to pin down most of the bugs on my own.

I even wrote a blog post giving Slaren more credit, because it instilled in him a false sense of confidence that led him to tackle harder problems, like multiplying three dimensional numbers. To fix the performance issues with mixture of experts models that caused, I tried to upstream my tinyBLAS tensor multiplication code in PR #6840 and it's a great example of what it's like to work with me. Gerganov's doctoral advisor was Iwan Kawrakow, who was the power behind the throne on that project. He invented the "K" quantization formats many people use to compress their weights. He was curious about my change and I told him that he'd be able to build better matrix multiplication kernels than me if he used my block tiling technique with his quants.

llamafile ended up receiving an avalanche of pull requests from Iwan that were licensed Apache 2.0 so that Gerganov couldn't use them. This enabled us to have faster cpu inference than any other project. That meant consumers and businesses stood a better chance of being able to use LLMs without needing to purchase expensive GPUs. We made that happen, even though the llama.cpp team had more than a million dollars of funding, and were successfully acquired by Hugging Face after Iwan had moved on to start his own project.

... continue reading