Why SWE-bench Verified no longer measures frontier coding capabilities
(news.ycombinator.com)
1.
2.
Many SWE-bench-Passing PRs would not be merged
(news.ycombinator.com)