ProgramBench: Can language models rebuild programs from scratch?
(news.ycombinator.com)
1.
2.
ProgramBench: Can Language Models Rebuild Programs from Scratch?
(news.ycombinator.com)
3.
4.
5.
DeepClaude – Claude Code agent loop with DeepSeek V4 Pro
(news.ycombinator.com)
7.
8.
How fast is a macOS VM, and how small could it be?
(news.ycombinator.com)
9.
Discovering hard disk physical geometry through microbenchmarking (2019)
(news.ycombinator.com)
10.
Show HN: A new benchmark for testing LLMs for deterministic outputs
(news.ycombinator.com)
11.
A Decade of AMD Ryzen: 10 Years of CPUs Tested
(techspot.com)
12.
A Decade of AMD Ryzen: 10 Years of CPUs Tested
(techspot.com)
13.
A good AGENTS.md is a model upgrade. A bad one is worse than no docs at all
(news.ycombinator.com)
14.
Show HN: OSS Agent I built topped the TerminalBench on Gemini-3-flash-preview
(news.ycombinator.com)
15.
16.
The predictable failure of the QDay Prize
(news.ycombinator.com)
17.
SWE-bench Verified no longer measures frontier coding capabilities
(news.ycombinator.com)
18.
Why SWE-bench Verified no longer measures frontier coding capabilities
(news.ycombinator.com)
19.
20.
Lambda Calculus Benchmark for AI
(news.ycombinator.com)
21.
Linux 7.1 Removes Drivers for Bus Mouse Support
(news.ycombinator.com)
22.
23.
AMD Ryzen 9 9950X3D2 review: More cache, more cash
(tomshardware.com)
24.
Kimi vendor verifier – verify accuracy of inference providers
(news.ycombinator.com)
25.
Arc Prize Foundation (YC W26) Is Hiring a Platform Engineer for ARC-AGI-4
(news.ycombinator.com)
26.
Experience vs specs: Our readers have spoken, and benchmarks aren’t everything
(androidauthority.com)
27.
Qwen3.6-35B-A3B on my laptop drew me a better pelican than Claude Opus 4.7
(news.ycombinator.com)
28.
29.
30.
N-Day-Bench – Can LLMs find real vulnerabilities in real codebases?
(news.ycombinator.com)
Today's top topics:
google
apple
openai
google health
chatgpt
anthropic
samsung
android authority
nvidia
spacex