I have written gemma3 inference in pure C

gemma3.c is a from‑scratch CPU inference engine for the Gemma 3 4B IT model. It proves that modern LLMs can run without Python, PyTorch, or GPUs.

✨ Highlights

⚙️ 100% Pure C (C11) – zero external dependencies

– zero external dependencies 🧠 Full Gemma 3 architecture – GQA, hybrid attention, SwiGLU

– GQA, hybrid attention, SwiGLU 🗺️ Memory‑mapped weights – BF16 SafeTensors via mmap

– BF16 SafeTensors via 🔤 Native SentencePiece tokenizer – 262K vocab

– 262K vocab 🌊 Streaming output – token‑by‑token callbacks

– token‑by‑token callbacks 💬 Interactive chat mode

📦 CLI + Library API

🐧 Linux/macOS native, 🪟 Windows via WSL (recommended) or MinGW