What's in a GGUF, besides the weights - and what's still missing?
GGUF is the file format that llama.cpp uses for language models.
The really neat thing about GGUF is that it's just one file. Compare this to a typical safetensors repo on huggingface, where there's a pile of necessary JSON files scattered around - or to a typical ollama model, which is an OCI with layers json, go templates, etc inside.
The contents are roughly the same, but GGUF makes it more ergonomic by keeping all this stuff in a single file.
But what is this stuff, and does it cover everything needed?
Chat Templates
Conversational language models are trained on sequences that follow a specific format, that sort of look like a conversation.
For instance, Gemma4's format looks like this:
<|turn>user Hi there!<turn|> <|turn>model Hi there, how can I help you today?<turn|>
...and LFM2's format template looks like this:
... continue reading