Structured Outputs Create False Confidence

If you use LLMs, you've probably heard about structured outputs. You might think they're the greatest thing since sliced bread. Unfortunately, structured outputs also degrade response quality.

Specifically, if you use an LLM provider's structured outputs API, you will get a lower quality response than if you use their normal text output API:

⚠️ you're more likely to make mistakes when extracting data, even in simple cases;

you're more likely to make mistakes when extracting data, even in simple cases; ⚠️ you're probably not modeling errors correctly;

you're probably not modeling errors correctly; ⚠️ it's harder to use techniques like chain-of-thought reasoning; and

it's harder to use techniques like chain-of-thought reasoning; and ⚠️ in the extreme case, it can be easier to steal your customer data using prompt injection.

These are very contentious claims, so let's start with an example: extracting data from a receipt.

If I use an LLM to extract the receipt entries, it should be able to tell me that one of the items is (name="banana", quantity=0.46) , right?

Well, using OpenAI's structured outputs API with gpt-5.2 - released literally this week! - it will claim that the banana quantity is 1.0 :

{ "establishment_name": "PC Market of Choice", "date": "2007-01-20", "total": 0.32, "currency": "USD", "items": [ { "name": "Bananas", "price": 0.32, "quantity": 1 } ] }

... continue reading