I replicated Ng's RYS method and found that duplicating 3 specific layers in Qwen2.5-32B boosts reasoning by 17% and duplicating layers 12-14 in Devstral-24B improves logical deduction from 0.22→0.76 on BBH — no training, no weight changes, just routing hidden states through the same circuit twice. Tools included. Two AMD GPUs, one evening.
Duplicate 3 layers. No training. Logical deduction goes from 0.22 → 0.76.
This toolkit finds and exploits "reasoning circuits" hidden inside transformer models. The idea: certain contiguous blocks of layers act as indivisible cognitive units. Duplicate them in the forward pass — same weights, no training, no merging — and the model gets measurably smarter on specific capabilities.
Built on David Ng's RYS method and extended with new findings. Everything here was discovered on two AMD consumer GPUs (RX 7900 XT + RX 6950 XT) in one evening.
Results
Devstral-Small-2-24B: Layers 12, 13, 14 duplicated once
Validated on standard benchmarks via lm-evaluation-harness at n=50:
Benchmark Base +3 layers Change BBH Logical Deduction 0.22 0.76 +245% GSM8K (strict) 0.48 0.64 +33% MBPP (code gen) 0.72 0.78 +8% GSM8K (flexible) 0.82 0.86 +5% BBH Navigate 0.96 0.98 +2% BBH Date Understanding 0.82 0.84 +2% BBH Causal Judgement 0.66 0.66 — IFEval (strict) 0.68 0.68 —
Average improvement: +8% across all metrics. Nothing degraded.
Qwen2.5-Coder-32B: Layers 7, 8, 9 duplicated once
... continue reading