LLM Neuroanatomy II: Modern LLM Hacking and Hints of a Universal Language?

LLM Neuroanatomy II: Modern LLM Hacking and hints of a Universal Language?

In Part 1, I described how duplicating a block of seven middle layers in Qwen2-72B — no weight changes, no training — produced the #1 model on the HuggingFace Open LLM Leaderboard. The method, which I called RYS (Repeat Your Self), was discovered using nothing but hard math probes and EQ-Bench on a pair of RTX 4090s.

That was mid-2024. Since then, a flood of strong open-source models has arrived — Qwen3.5, MiniMax, GLM-4.7, and others — and I finally have enough compute at home to scan them properly.

So the question driving this post is simple: was RYS a fluke of Qwen2-72B, or is it a general property of Transformers?

More specifically:

Does relayering still help on stronger modern models? Which modifications actually earn their extra layers? If two good motifs help independently, do they stack?

The short answer is yes, relayering survives. The longer answer took 3,024 beam search candidates, a surrogate model scoring 2 million configurations, and a unified validation sweep to work out properly. Along the way, I also released the scanning code and a set of new RYS models.

Let’s get into it!

Why Qwen3.5-27B

The Qwen3.5 family dropped around Chinese New Year 2026 and immediately became the darling of the LocalLLaMA crowd. Strong benchmarks, good vibes, well-engineered.

... continue reading