Show HN: Duplicate 3 layers in a 24B LLM, logical deduction .22→.76. No training

https://github.com/alainnothere/llm-circuit-finder

77•xlayn•5h ago

I replicated David Ng's RYS method (https://dnhkng.github.io/posts/rys/) on consumer AMD GPUs (RX 7900 XT + RX 6950 XT) and found something I didn't expect.

Transformers appear to have discrete "reasoning circuits" — contiguous blocks of 3-4 layers that act as indivisible cognitive units. Duplicate the right block and the model runs its reasoning pipeline twice. No weights change. No training. The model just thinks longer.

The results on standard benchmarks (lm-evaluation-harness, n=50):

Devstral-24B, layers 12-14 duplicated once: - BBH Logical Deduction: 0.22 → 0.76 - GSM8K (strict): 0.48 → 0.64 - MBPP (code gen): 0.72 → 0.78 - Nothing degraded

Qwen2.5-Coder-32B, layers 7-9 duplicated once: - Reasoning probe: 76% → 94%

The weird part: different duplication patterns create different cognitive "modes" from the same weights. Double-pass boosts math. Triple-pass boosts emotional reasoning. Interleaved doubling (13,13,14,14,15,15,16) creates a pure math specialist. Same model, same VRAM, different routing.

The circuit boundaries are sharp — shift by one layer and the effect disappears or inverts. Smaller models (24B) have tighter circuits (3 layers) than larger ones (Ng found 7 layers in 72B).

Tools to find circuits in any GGUF model and apply arbitrary layer routing are in the repo. The whole thing — sweep, discovery, validation — took one evening.

Happy to answer questions.

Comments

woadwarrior01•1h ago

Reminds me of Solar 10.7B, which was a very good model for its size ~2 year ago and the "Depth Up-Scaling" technique behind it. Although, that involved continued training after repeating the layers.

https://arxiv.org/abs/2312.15166

colejhudson•1h ago

Would you be able to publish the individual benchmarks for Qwen2.5-Coder-32B? GSM8K specifically would be useful to look at.

xlayn•58m ago

I published the results for devstral... results folder of the github https://github.com/alainnothere/llm-circuit-finder/tree/main...

I'm using the following configuration --tasks gsm8k_cot,ifeval,mbpp,bbh_cot_fewshot_logical_deduction_five_objects,mbpp I did also try humaneval but something in the harness is missing and failed...

notice that I'm running 50 tests for each task, mostly because of time limitation as it takes like two hours to validate the run for the base model and the modified one.

I'll also try to publish the results of the small tests harness when I'm testing the multiple layers configurations, for reference this is phi-4-Q6_K.gguf, still running, I'm now giving more importance to the Reason factor, the reason factor comes from running a small subset of all the problems in the task config above

Initially I tried the approach of the highest math/eq but in resulted in models that were less capable overall with the exception of math, and math like in the original research is basically how good was the model at giving you the answer of a really though question, say the cubic root of some really large number... but that didn't translate to the model being better at other tasks...

  Config  | Lyr | Math   | EQ    | Reas   | Math Δ  | EQ Δ  | Reas Δ  | Comb Δ
  --------|-----|--------|-------|--------|---------|-------|---------|-------
  BASE    |   0 | 0.7405 | 94.49 | 94.12% |     --- |   --- |     --- |    ---
  (6,9)   |   3 | 0.7806 | 95.70 | 94.12% | +0.0401 | +1.21 |  +0.00% |  +1.21
  (9,12)  |   3 | 0.7247 | 95.04 | 94.12% | -0.0158 | +0.55 |  +0.00% |  +0.55
  (12,15) |   3 | 0.7258 | 94.14 | 88.24% | -0.0147 | -0.35 |  -5.88% |  -6.23
  (15,18) |   3 | 0.7493 | 95.74 | 88.24% | +0.0088 | +1.25 |  -5.88% |  -4.63
  (18,21) |   3 | 0.7204 | 93.40 | 94.12% | -0.0201 | -1.09 |  +0.00% |  -1.09
  (21,24) |   3 | 0.7107 | 92.97 | 88.24% | -0.0298 | -1.52 |  -5.88% |  -7.41
  (24,27) |   3 | 0.6487 | 95.27 | 88.24% | -0.0918 | +0.78 |  -5.88% |  -5.10
  (27,30) |   3 | 0.7180 | 94.65 | 88.24% | -0.0225 | +0.16 |  -5.88% |  -5.73
  (30,33) |   3 | 0.7139 | 94.02 | 94.12% | -0.0266 | -0.47 |  +0.00% |  -0.47
  (33,36) |   3 | 0.7104 | 94.53 | 94.12% | -0.0301 | +0.04 |  +0.00% |  +0.04
  (36,39) |   3 | 0.7017 | 94.69 | 94.12% | -0.0388 | +0.20 |  +0.00% |  +0.20
  (6,10)  |   4 | 0.8125 | 96.37 | 88.24% | +0.0720 | +1.88 |  -5.88% |  -4.01
  (9,13)  |   4 | 0.7598 | 95.08 | 94.12% | +0.0193 | +0.59 |  +0.00% |  +0.59
  (12,16) |   4 | 0.7482 | 93.71 | 88.24% | +0.0076 | -0.78 |  -5.88% |  -6.66
  (15,19) |   4 | 0.7617 | 95.16 | 82.35% | +0.0212 | +0.66 | -11.76% | -11.10
  (18,22) |   4 | 0.6902 | 92.27 | 88.24% | -0.0504 | -2.23 |  -5.88% |  -8.11
  (21,25) |   4 | 0.7288 | 94.10 | 88.24% | -0.0117 | -0.39 |  -5.88% |  -6.27
  (24,28) |   4 | 0.6823 | 94.57 | 88.24% | -0.0583 | +0.08 |  -5.88% |  -5.80
  (27,31) |   4 | 0.7224 | 94.41 | 82.35% | -0.0181 | -0.08 | -11.76% | -11.84
  (30,34) |   4 | 0.7070 | 94.73 | 94.12% | -0.0335 | +0.23 |  +0.00% |  +0.23
  (33,37) |   4 | 0.7009 | 94.38 |100.00% | -0.0396 | -0.12 |  +5.88% |  +5.77
  (36,40) |   4 | 0.7057 | 94.84 | 88.24% | -0.0348 | +0.35 |  -5.88% |  -5.53
  (6,11)  |   5 | 0.8168 | 95.62 |100.00% | +0.0762 | +1.13 |  +5.88% |  +7.02
  (9,14)  |   5 | 0.7245 | 95.23 | 88.24% | -0.0160 | +0.74 |  -5.88% |  -5.14
  (12,17) |   5 | 0.7825 | 94.88 | 88.24% | +0.0420 | +0.39 |  -5.88% |  -5.49
  (15,20) |   5 | 0.7832 | 95.86 | 88.24% | +0.0427 | +1.37 |  -5.88% |  -4.52
  (18,23) |   5 | 0.7208 | 92.42 | 88.24% | -0.0197 | -2.07 |  -5.88% |  -7.95
  (21,26) |   5 | 0.7055 | 92.89 | 88.24% | -0.0350 | -1.60 |  -5.88% |  -7.48
  (24,29) |   5 | 0.5825 | 95.04 | 94.12% | -0.1580 | +0.55 |  +0.00% |  +0.55
  (27,32) |   5 | 0.7088 | 94.18 | 88.24% | -0.0317 | -0.31 |  -5.88% |  -6.19
  (30,35) |   5 | 0.6787 | 94.69 | 88.24% | -0.0618 | +0.20 |  -5.88% |  -5.69
  (33,38) |   5 | 0.6650 | 94.96 | 88.24% | -0.0755 | +0.47 |  -5.88% |  -5.41
  (6,12)  |   6 | 0.7692 | 95.39 | 94.12% | +0.0287 | +0.90 |  +0.00% |  +0.90
  (9,15)  |   6 | 0.7405 | 94.65 | 94.12% | -0.0000 | +0.16 |  +0.00% |  +0.16
  (12,18) |   6 | 0.7582 | 94.57 | 88.24% | +0.0177 | +0.08 |  -5.88% |  -5.80
  (15,21) |   6 | 0.7828 | 93.52 | 88.24% | +0.0423 | -0.98 |  -5.88% |  -6.86
  (18,24) |   6 | 0.7308 | 92.93 | 94.12% | -0.0097 | -1.56 |  +0.00% |  -1.56
  (21,27) |   6 | 0.6791 | 92.54 | 82.35% | -0.0615 | -1.95 | -11.76% | -13.72

rao-v•1h ago

I’d love to believe this is real, but I’m pretty sure you will lose performance on a “fair” mix of tasks, even after fine tuning. I know multiple teams have explored recurrent layers (great for limited VRAM) but I don’t think it’s ever been found to be optimal.

SyzygyRhythm•1h ago

If running twice is good, then is running N times even better? I wonder if you could even loop until some kind of convergence, say hitting a fixed point (input equals output). I wonder if there's even a sort of bifurcation property where it sometimes loops A->A->A, but other times A->B->A, or more, rather like the logistic map fractal.

xlayn•39m ago

I explored that, again with Devstral, but the execution with 4 times the same circuit lead to less score on the tests.

I chat with the model to see if the thing was still working and seemed coherent to me, I didn't notice anything off.

I need to automate testing like that, where you pick the local maxima and then iterate over that picking layers to see if it's actually better, and then leave the thing running overnight

XCSme•1h ago

But if it got worse on other tests, it doesn't do much good, right?

ekianjo•1h ago

Which tests are worse?

XCSme•1h ago

Hard to tell, they only mention a few ones that got better, not clear results on others

xlayn•36m ago

You can check here the results for Devstral, speed limits me, but these are the results for the first 50 tests of the command

  # Run lm-evaluation-harness
  lm_eval --model local-chat-completions \
      --model_args model=test,base_url=http://localhost:8089/v1/chat/completions,num_concurrent=1,max_retries=3,tokenized_requests=False \
      --tasks gsm8k_cot,ifeval,mbpp,bbh_cot_fewshot_logical_deduction_five_objects,mbpp \
      --apply_chat_template --limit 50 \
      --output_path ./eval_results

zhangchen•1h ago

this lines up with what pruning papers have been finding, the middle layers carry most of the reasoning weight and you can often drop the outer ones without much loss. cool to see the inverse also works, just stacking them for extra passes.

nowittyusername•56m ago

There's still a lot of low hanging fruit left IMO. Good find and rather funny to think about as you can have someone simply clone the various layers multiple times and instead of spending millions of dollars retraining the model increase performance significantly with "this one trick".

xlayn•42m ago

The other interesting point is that right now I'm copy pasting the layers, but a patch in llama.cpp can make the same model now behave better by a fact of simply following a different "flow" without needing more vram...

if this is validated enough it can eventually lead to ship some kind of "mix" architecture with layers executed to fit some "vibe?"

Devstral was the first one I tried and optimize for math/eq, but that din't result in any better model, then I added the reason part, and that resulted in "better" model

I used the devstral with the vibe.cli and it look sharp to me, thing didn't fail, I also used the chat to "vibe" check it and look ok to me.

The other thing is that I pick a particular circuit and that was "good" but I don't know if it was a local maxima, I think I ran just like 10 sets of the "fast test harness" and pick the config that gave the most score... once I have that I use that model and run it against the llm_eval limited to only 50 tests... again for sake of speed, I didn't want to wait a week to discover the config was bad

Karuma•44m ago

Wow, every single word in the original post and on that README.md is pure LLM. How sad.

In any case, this has been done at least since the very first public releases of Llama by Meta... It also works for image models. There are even a few ComfyUI nodes that let you pick layers to duplicate on the fly, so you can test as many as you want really quickly.

xlayn•13m ago

Fair point, I used claude extensively in the project including drafting.

And well, there is always the possibility of this test contributing in any way (maybe?), which in my books is positive.

taliesinb•38m ago

There is an obvious implication: since the initial models were trained without loops, it is exceedingly unlikely that a single stack of consecutive N layers represents only a single, repeatable circuit that can be safely looped. It is much more likely that the loopable circuits are superposed across multiple layers and have different effective depths.

That you can profitably loop some say 3-layer stack is likely a happy accident, where the performance loss from looping 3/4 of mystery circuit X that partially overlaps that stack is more than outweighed by the performance gain from looping 3/3 of mystery circuit Y that exactly aligns with that stack.

So, if you are willing to train from scratch, just build the looping in during training and let each circuit find its place, in disentangled stacks of various depths. Middle of transformer is:

(X₁)ᴹ ⊕ (Y₁∘Y₂)ᴺ ⊕ (Z₁∘Z₂∘Z₃)ᴾ ⊕ …

Notation: Xᵢ is a layer (of very small width) in a circuit of depth 1..i..D, ⊕ is parallel composition (which sums the width up to rest of transformer), ∘ is serial composition (stacking), and ᴹ is looping. The values of ᴹ shouldnt matter as long as they are > 1, the point is to crank them up after training.

Ablating these individual circuits will tell you whether you needed them at all, but also roughly what they were for in the first place, which would be very interesting.

taliesinb•32m ago

And i bet these would be useful in initial and final parts of transformer too. Because syntactic parsing and unparsing of brackets, programming language ASTs, etc is highly recursive; no doubt current models are painfully learning "unrolled" versions of the relevant recursive circuits, unrolled to some fixed depth that must compete for layers with other circuits, since your total budget is 60 or whatever. Incredibly duplicative and by definition unable to generalize to arbitrary depth!

taliesinb•26m ago

Amusingly, you need only have circuits of prime depth, though you should probably adjust their widths using something principled, perhaps Euler's totient function.

Singlaw•5m ago

What does this do?

Austin’s surge of new housing construction drove down rents

Autoresearch for SAT Solvers

Warranty Void If Regenerated

Nvidia greenboost: transparently extend GPU VRAM using system RAM/NVMe

Show HN: Duplicate 3 layers in a 24B LLM, logical deduction .22→.76. No training

OpenRocket

Rob Pike’s Rules of Programming (1989)

Wander – A tiny, decentralised tool to explore the small web

The math that explains why bell curves are everywhere

Nvidia NemoClaw

Show HN: Will my flight have Starlink?

What’s on HTTP?

Book: The Emerging Science of Machine Learning Benchmarks

Nightingale – open-source karaoke app that works with any song on your computer

An x86-64 back end for raven-uxn

CVE-2026-3888: Important Snap Flaw Enables Local Privilege Escalation to Root

Show HN: I built 48 lightweight SVG backgrounds you can copy/paste

Show HN: Playing LongTurn FreeCiv with Friends

2025 Turing award given for quantum information science

Czech Man's Stone in Barn's Foundations Is Rare Bronze Age Spearhead Mold

On a Boat

OpenAI Has New Focus (on the IPO)

RX – a new random-access JSON alternative

Show HN: Hacker News archive (47M+ items, 11.6GB) as Parquet, updated every 5m

Machine Payments Protocol (MPP)

Despite Doubts, Federal Cyber Experts Approved Microsoft Cloud Service

Measuring progress toward AGI: A cognitive framework

Trevor Milton is raising funds for a new jet he claims will transform flying

Show HN: Tmux-IDE, OSS agent-first terminal IDE

Explore 19th Century Scientific Correspondence

Austin’s surge of new housing construction drove down rents

Autoresearch for SAT Solvers

Warranty Void If Regenerated

Nvidia greenboost: transparently extend GPU VRAM using system RAM/NVMe

Show HN: Duplicate 3 layers in a 24B LLM, logical deduction .22→.76. No training

OpenRocket

Rob Pike’s Rules of Programming (1989)

Wander – A tiny, decentralised tool to explore the small web

The math that explains why bell curves are everywhere

Nvidia NemoClaw

Show HN: Will my flight have Starlink?

What’s on HTTP?

Book: The Emerging Science of Machine Learning Benchmarks

Nightingale – open-source karaoke app that works with any song on your computer

An x86-64 back end for raven-uxn

CVE-2026-3888: Important Snap Flaw Enables Local Privilege Escalation to Root

Show HN: I built 48 lightweight SVG backgrounds you can copy/paste

Show HN: Playing LongTurn FreeCiv with Friends

2025 Turing award given for quantum information science

Czech Man's Stone in Barn's Foundations Is Rare Bronze Age Spearhead Mold

On a Boat

OpenAI Has New Focus (on the IPO)

RX – a new random-access JSON alternative

Show HN: Hacker News archive (47M+ items, 11.6GB) as Parquet, updated every 5m

Machine Payments Protocol (MPP)

Despite Doubts, Federal Cyber Experts Approved Microsoft Cloud Service

Measuring progress toward AGI: A cognitive framework

Trevor Milton is raising funds for a new jet he claims will transform flying

Show HN: Tmux-IDE, OSS agent-first terminal IDE

Explore 19th Century Scientific Correspondence

Show HN: Duplicate 3 layers in a 24B LLM, logical deduction .22→.76. No training

Comments