Structured outputs create false confidence

Update (Dec 21): this post is now on the Hacker News front page! We've updated this post to be more precise about our claims and have also added some clarifications at the end. You can see the original version of this post here.

If you use LLMs, you've probably heard about structured outputs. You might think they're the greatest thing since sliced bread. Unfortunately, structured outputs also often degrade response quality.

Specifically, if you use an LLM provider's structured outputs API, you're likely to get a lower quality response than if you use their normal text output API:

⚠️ you're more likely to make mistakes when extracting data, even in simple cases;

you're more likely to make mistakes when extracting data, even in simple cases; ⚠️ you're probably not modeling errors correctly;

you're probably not modeling errors correctly; ⚠️ it's harder to use techniques like chain-of-thought reasoning; and

it's harder to use techniques like chain-of-thought reasoning; and ⚠️ in the extreme case, it can be easier to steal your customer data using prompt injection.

These are very contentious claims, so let's start with an example: extracting data from a receipt.

If I use an LLM to extract the receipt entries, it should be able to tell me that one of the items is (name="banana", quantity=0.46) , right?

Well, using OpenAI's structured outputs API with gpt-5.2 - released literally this week! - it will claim that the banana quantity is 1.0 :

... continue reading