Research14d ago

Show HN: Duplicate 3 layers in a 24B LLM, logical deduction .22→.76. No training

What it is

Picture a transformer model as a factory assembly line with 32 stations. Researchers found that stations 12-15 form a complete "reasoning module"—duplicate those four stations and products pass through twice, getting better quality output. This RYS (Recurrent Layer Stacking) method copies specific layer blocks in already-trained models, forcing the model to re-process information through the same reasoning circuit multiple times.

Why it matters

You can potentially upgrade your local LLMs for logic-heavy tasks without retraining or fine-tuning. If you're running inference on consumer hardware (like AMD RX 7900 XT), this technique could turn a mediocre reasoner into a capable one by editing the model architecture file. The catch: it's model-specific and requires finding which layers actually contain reasoning circuits—not every block works.

Key details

•Tested on Qwen2.5-32B-Instruct, duplicating layers 12-15 increased SimpleQA logical deduction from 22% to 76% accuracy
•Method involves zero training—just copy-paste specific transformer layers in the model config and re-run inference
•Runs on consumer AMD GPUs (RX 7900 XT + RX 6950 XT), making it accessible outside NVIDIA ecosystems
•Developer built llm-circuit-finder tool to identify which layer blocks contain reasoning circuits in different models
•Based on David Ng's RYS research showing transformers have modular cognitive units, not smooth gradient processing

Sources

Show HN: Duplicate 3 layers in a 24B LLM, logical deduction .22→.76. No training(hn)