How We Broke Top AI Agent Benchmarks: And What Comes Next
(news.ycombinator.com)
1.
2.
Even GPT-5.2 Can't Count to Five: Zero-Error Horizons in Trustworthy LLMs
(news.ycombinator.com)