Skip to content
Tech News
← Back to articles

Show HN: Auto-Architecture: Karpathy's Loop, pointed at a CPU

read original get Auto-Architecture CPU Kit → more articles
Why This Matters

This article highlights how an autonomous research loop, initially tested within familiar domains like Python and gradient descent, can be extended to hardware architecture design by applying it to a CPU. The experiment demonstrates the potential for AI-driven automation to optimize complex hardware systems, which could significantly accelerate innovation and reduce development time in the tech industry.

Key Takeaways

Auto-Architecture: Karpathy's Loop, Pointed at a CPU

What happens when you take an autonomous research loop out of its comfort zone and aim it at a domain it has no business being good at? Andrej Karpathy's autoresearch showed that a coding agent, given two days and a single-GPU nanochat, finds 20 training-time optimizations on its own. The recipe is general — propose, implement, measure, keep the wins — but the demonstration was inside the agent's home turf: Python, gradient descent, well-known knobs.

I wanted to know if it generalized. So I pointed it at a CPU.

The setup

auto-arch-tournament is a 5-stage in-order RV32IM core in SystemVerilog — the textbook pipeline you'd write in a graduate architecture class. No caches, no branch predictor, no multi-issue on day one. Those are research-loop hypotheses, not features.

The orchestrator is hardcoded. The LLM never edits it. Each round, three slots run in parallel:

The agent proposes a microarchitectural hypothesis as YAML, schema-checked against schemas/hypothesis.schema.json . An implementation agent edits files under rtl/ in an isolated git worktree. The eval gate runs: riscv-formal — 53 symbolic BMC checks (decode, traps, ordering, liveness, M-ext)

— 53 symbolic BMC checks (decode, traps, ordering, liveness, M-ext) Verilator cosim — RVFI byte-identical against a Python ISS, ~22% random bus stalls

— RVFI byte-identical against a Python ISS, ~22% random bus stalls 3-seed nextpnr P&R on a Gowin GW2A-LV18 (Tang Nano 20K) — median Fmax × CoreMark iter/cycle = fitness

on a Gowin GW2A-LV18 (Tang Nano 20K) — median Fmax × CoreMark iter/cycle = fitness CoreMark CRC validation — the same 4 CRCs VexRiscv reports against Improvement → merged into the trunk, becomes the new baseline. Regression / broken / placement-failed → worktree destroyed.

... continue reading