2 minutes read This article shows how one can teach Llamafile to handle structured outputs like JSON. If you’re already familiar with LangChain, you’ll know that popular models like OpenAI include their own implementations of with_structured_output. Using it is straightforward: All we need is to derive a new class from Pydantic’s BaseModel. The rest happens transparently. You don’t need to teach the LLM anything. Using Llamafile This isn’t currently possible with Llamafile, which I’m using in my local environment. In case you don’t know what Llamafile is, here’s a quick rundown before we dive into structured outputs. A Llamafile is an executable LLM that you can run locally. Technically it’s a combination of llama.cpp with the Cosmopolitan Libc, which makes it executable on a wide range of architectures. By default it starts a local LLM instance that you can access in your browser at http://localhost:8080 I’ll be using Llama-3.2-1B-Instruct-Q8_0.llamafile because I’m writing this on a relatively weak machine (an old MacBook Air from 2018 with 8 GB of RAM). After downloading, just make the file executable with chmod +x . Windows users can also run Llamafiles but need to take care when trying to run files over 4 GB. To run Llamafile as a server use the --server --nobrowser --unsecure flags Now you can test the server by opening http://localhost:8080 in your browser. You should see this page: Structured Outputs with Llamafile Now we can teach Llamafile to produce structured output. Since it lacks a with_structured_output method, we import JsonOutputParser and PromptTemplate from LangChain’s libraries. First we define an Answer class that represents the JSON output we expect the LLM to return. To make the example more realistic, I’ve added a few extra properties to Answer class. Next we provide our new answer type to LangChain’s JsonOutputParser To complete the setup we define a PromptTemplate, injecting the parser’s format instructions into it. The final step is chaining the three Runnable Interface implementations: prompt, llm, and parser The rest of the code is straightforward Invoke the chain Print out the answer (either as raw JSON or by using the utility function display_answer) In error cases call prompt and llm only while ignoring the parser Depending on the LLM you use, the output will vary but it should be formatted like this: You can find the sources in this repository. Have fun with Llamafile 🙂