Skip to content
Tech News
← Back to articles

The "confident idiot" problem: Why AI needs hard rules, not vibe checks

read original get Machine Learning → more articles

The Lie

We have all been there. You build an agent. It works perfectly in the demo. You deploy it. And then, on a Tuesday at 3 PM, it decides that the URL for the API documentation is api.stripe.com/v1/users (a 404), but it looks so plausible that you waste 20 minutes debugging network errors.

Worse, it says this with 100% confidence.

When we try to fix this today, the industry tells us to use “LLM-as-a-Judge.” We are told to ask GPT-4o to grade GPT-3.5. We are told to fix the “vibes.”

But this creates a dangerous circular dependency. If the underlying models suffer from sycophancy (agreeing with the user) or hallucination, a Judge model often hallucinates a passing grade.

We are trying to fix probability with more probability. That is a losing game.

Code > Vibes

I believe we need to stop treating Agents like magic boxes and start treating them like software. Software has assertions. Software has unit tests. Software has return False .

We need to re-introduce Determinism into the stack.

Don’t ask an LLM if a URL is valid. It will hallucinate a 200 OK. Run requests.get() .

... continue reading