N-Day-Bench – Can LLMs find real vulnerabilities in real codebases?
(news.ycombinator.com)
31.
32.
Android now stops you sharing your location in photos
(news.ycombinator.com)
33.
Why we spent 50+ hours retesting Intel’s Core Ultra 270K Plus and 250K Plus
(tomshardware.com)
34.
Exploiting the most prominent AI agent benchmarks
(news.ycombinator.com)
35.
How We Broke Top AI Agent Benchmarks: And What Comes Next
(news.ycombinator.com)
36.
AI models are terrible at betting on soccer—especially xAI Grok
(arstechnica.com)
37.
Nubia defends the ethics of REDMAGIC 11 Pro benchmark manipulation
(androidauthority.com)
38.
39.
40.
41.
42.
43.
44.
45.
46.
AWS Engineer Reports PostgreSQL Perf Halved by Linux 7.0, Fix May Not Be Easy
(news.ycombinator.com)
47.
The Download: gig workers training humanoids, and better AI benchmarks
(technologyreview.com)
48.
Analyzing Geekbench 6 under Intel's BOT
(news.ycombinator.com)
49.
50.
AI benchmarks are broken. Here’s what we need instead.
(technologyreview.com)
51.
52.
53.
54.
55.
$500 GPU outperforms Claude Sonnet on coding benchmarks
(news.ycombinator.com)
56.
A top AI researcher explains the limitations of current models
(feeds.feedburner.com)
57.
58.
59.
ARC-AGI-3 benchmark is out now
(news.ycombinator.com)