Skip to content
Tech News
← Back to articles

Show HN: Duplicate 3 layers in a 24B LLM, logical deduction .22→.76. No training

read original get GPT-4 Technical Manual → more articles
Why This Matters

This discovery demonstrates that by duplicating specific layers within large language models, developers can significantly enhance reasoning and logical deduction capabilities without additional training or weight modifications. This approach offers a simple yet powerful method to improve model performance, potentially reducing the need for extensive retraining and enabling more efficient model optimization for various applications.

Key Takeaways

I replicated Ng's RYS method and found that duplicating 3 specific layers in Qwen2.5-32B boosts reasoning by 17% and duplicating layers 12-14 in Devstral-24B improves logical deduction from 0.22→0.76 on BBH — no training, no weight changes, just routing hidden states through the same circuit twice. Tools included. Two AMD GPUs, one evening.

Duplicate 3 layers. No training. Logical deduction goes from 0.22 → 0.76.

This toolkit finds and exploits "reasoning circuits" hidden inside transformer models. The idea: certain contiguous blocks of layers act as indivisible cognitive units. Duplicate them in the forward pass — same weights, no training, no merging — and the model gets measurably smarter on specific capabilities.

Built on David Ng's RYS method and extended with new findings. Everything here was discovered on two AMD consumer GPUs (RX 7900 XT + RX 6950 XT) in one evening.

Results

Devstral-Small-2-24B: Layers 12, 13, 14 duplicated once

Validated on standard benchmarks via lm-evaluation-harness at n=50:

Benchmark Base +3 layers Change BBH Logical Deduction 0.22 0.76 +245% GSM8K (strict) 0.48 0.64 +33% MBPP (code gen) 0.72 0.78 +8% GSM8K (flexible) 0.82 0.86 +5% BBH Navigate 0.96 0.98 +2% BBH Date Understanding 0.82 0.84 +2% BBH Causal Judgement 0.66 0.66 — IFEval (strict) 0.68 0.68 —

Average improvement: +8% across all metrics. Nothing degraded.

Qwen2.5-Coder-32B: Layers 7, 8, 9 duplicated once

... continue reading