AI Agent Benchmarks Are Broken
(news.ycombinator.com)
31.
32.
Koala: A benchmark suite for performance-oriented shell-optimization research
(news.ycombinator.com)
33.
34.