gemma3.c is a from‑scratch CPU inference engine for the Gemma 3 4B IT model. It proves that modern LLMs can run without Python, PyTorch, or GPUs.
✨ Highlights
⚙️ 100% Pure C (C11) – zero external dependencies
– zero external dependencies 🧠 Full Gemma 3 architecture – GQA, hybrid attention, SwiGLU
– GQA, hybrid attention, SwiGLU 🗺️ Memory‑mapped weights – BF16 SafeTensors via mmap
– BF16 SafeTensors via 🔤 Native SentencePiece tokenizer – 262K vocab
– 262K vocab 🌊 Streaming output – token‑by‑token callbacks
– token‑by‑token callbacks 💬 Interactive chat mode
📦 CLI + Library API
🐧 Linux/macOS native, 🪟 Windows via WSL (recommended) or MinGW
... continue reading