Show HN: How I Topped the HuggingFace Open LLM Leaderboard on Two Gaming GPUs

https://dnhkng.github.io/posts/rys/

36•dnhkng•2h ago

Comments

dnhkng•2h ago

Author here. I found that duplicating a specific block of 7 middle layers in Qwen2-72B, without modifying any weights, improved performance across all Open LLM Leaderboard benchmarks and took #1. As of 2026, the top 4 models on that leaderboard are still descendants.

The weird finding: single-layer duplication does nothing. Too few layers, nothing. Too many, it gets worse. Only circuit-sized blocks of ~7 layers work. This suggests pretraining carves out discrete functional circuits in the layer stack that only work when preserved whole.

The whole thing was developed on 2x RTX 4090s in my basement. I'm now running current models (GLM-4.7, Qwen3.5, MiniMax M2.5) on a dual GH200 rig (see my other post). Code and new models coming soon.

Happy to answer questions.

rapatel0•56m ago

I think you may have cracked latent space reasoning. I've had a hunch that something like this would work, but couldn't figure out how the training would back propagate. But you've shown that you just need to duplicate existing layers.

Have you tried a simple inline loop over the duplicated layers? Would be interesting to see performance. Also, would be interesting to compare with a MOE model. See if these layers are acting like different agreeing "experts" or if there is reasoning happening in the latent space.

naasking•50m ago

This layer duplication strikes me as a bit of "poor man's" version of looped language models:

https://ouro-llm.github.io/

Pretty cool though. LLM brain surgery.

dnhkng•9m ago

Agrees, but one thing to note:

I really think from the experiments that 'organs' (not sure what to term this), develop during massive pretraining. This also means maybe looping the entire models is actually not efficient. Maybe a better way is [linear input section -> loop 1 -> linear section -> loop 2 -> linear section -> ... -> loop n -> linear output]?

This would give 'organs' space to develop.

jauntywundrkind•31m ago

The dual GH200 build was amazing. Awesome to see someone with such talent & flare in one area also doing great in another area. Thanks for noting that that was you. https://news.ycombinator.com/item?id=46222237

digdugdirk•23m ago

Super cool! Do you do any analysis or have any tools that help you identify these circuits? I came across this [1] recently, and wanted to try to identify specifically strong "circuits" in what seems to be a similar way to what you did.

[1] https://weightwatcher.ai/

dnhkng•13m ago

I build my own analysis tools. I'm just finishing up running the current generation of LLMs (MiniMax M2.5 and the Qwen3.5 family), and then I will put it all on Github.

It less 'tool', than an assorted set of scripts, tailored to my unusual hardware setup. But it should be easy to extend; I would have released this earlier but I had the (stupid) idea to 'write a paper' on this. Aiming for that delayed this a year. Blogs are the way to go (for me).

blourvim•1h ago

I am not really an ml dev so I don't understand most of it. It does sound ridiculous how it would even work work. Brilliant work and great article I enjoyed reading it

This sounds similar to the Kimi's mixture of experts architecture if I understood it correctly(likely I have not), can you comment on this ?

dnhkng•6m ago

No worries, happy to discuss anyway :)

MoE (mixture of experts), is an architecture that forces sparsity (not all 'neurons' are active during the forward pass.

This is pretty much orthogonal to that; it works with dense and MoE models, by repeating 'vertical' sections of the transformer stack.

tgw43279w•1h ago

That was a fun read! The base64 decoding and encoding is quite interesting. A parallel: these models are surprisingly robust to heavy word mangling, back in 2023 people used this trick to jailbreak the models very often, but what was more surprising is that they even understand it. I always thought of it this way there must be some circuitry in the model that maps these almost unrecognizable words/sentences into their rectified versions. But what your base64 also shows is the fact thy can also encode them back as well! (However models are known to not be able to produce mangled output that looks convincingly random. I think the base64 transformation is more mechanical in this regard and hence it‘s easier to do the reverse for them.) So your layer circuit hypothesis aligns pretty well with my mental model of how these models work based on the interpretability work I am familiar with! I really also like the way you used the heatmaps as a tool to derive layer insights, very intuitive! But it’s really surprising that you can simply duplicate layers and achieve better results that generalize! This is some research grade effort! I’m confident you could publish this in NeurIPS or ICML if you put it into a paper! I‘m quite impressed! Great work!

WithinReason•44m ago

Here is a paper that made a similar observation recently:

https://www.alphaxiv.org/abs/2512.19941

tgw43279w•36m ago

Very cool, thanks for sharing! Recovering 96% using just two blocks on IMN-1k, wow!

dnhkng•17m ago

Thanks for the link!

I think that these models have to learn to efficiently use their parameters, and the best way to do that is 'evolve' (yes, a bad word for it), structures over pretraining time. Unfortunately, they don't have a way to access these structures 'from the inside'. I hope this new approach lets up boost performance in s more experimentally rigorous way

WithinReason•13m ago

I think the recurrence is a consequence of using a residual connection, seems like that makes the representation stay consistent across layers

seeknotfind•25m ago

Did you ever try multiple copies?

dnhkng•20m ago

I did, but the combinatorics are mad. I have also tried training a meta-model that predicts the outputs of the combinations.

I will make another post if the topic is popular; its pretty geeky though, even more than my usual blog posts...

tjwei•15m ago

Really interesting discovery, especially the part about base64. Reminds me of this: Transformer Layers as Painters https://arxiv.org/abs/2407.09298

cootsnuck•10m ago

Super cool. Love seeing these writeups of hobbyists getting their hands dirty, breaking things, and then coming out on the other side of it with something interesting.

Show HN: How I Topped the HuggingFace Open LLM Leaderboard on Two Gaming GPUs

Show HN: DD Photos – open-source photo album site generator (Go and SvelteKit)

Show HN: Remotely use my guitar tuner

Show HN: A playable version of the Claude Code Terraform destroy incident

Show HN: Find Engineering Manager Jobs Efficiently

Show HN: Get AI to write code that it can read

Show HN: DenchClaw – Local CRM on Top of OpenClaw

SHOW HN: A usage circuit breaker for Cloudflare Workers

Show HN: Smux – Terminal Multiplexer built for AI agents

Show HN: Local-first firmware analyzer using WebAssembly

Show HN: The Mog Programming Language

Show HN: AI agent that runs real browser workflows

Show HN: I Was Here – Draw on street view, others can find your drawings

Show HN: VS Code Agent Kanban: Task Management for the AI-Assisted Developer

Show HN: Hopalong Attractor. An old classic with a new perspective in 3D

Show HN: Hotwire Club – A Learning Community for Hotwire (Turbo/Stimulus/Rails)

Show HN: Skir – like Protocol Buffer but better

Show HN: I wrote an application to help me practice speaking slower

Show HN: I built a real-time OSINT dashboard pulling 15 live global feeds

Show HN: I built a site where strangers leave kind voice notes for each other

Show HN: Latchup – Competitive programming for hardware description languages

Show HN: Eyot, A programming language where the GPU is just another thread

Show HN: Zenòdot – Find if a book has been translated into your language

Show HN: sAT Protocol – static social networking

Show HN: AI matchmaking from open ended dating profiles

Show HN: Husky hook that blocks Git push until you do your pushups

Show HN: Curiosity – DIY 6" Newtonian Reflector Telescope

Show HN: I gave my robot physical memory – it stopped repeating mistakes

Show HN: WolfStack – Proxmox-like server management in a single Rust binary

Show HN: Reviving a 20-year-old puzzle game Chromatron with Ghidra and AI

Show HN: How I Topped the HuggingFace Open LLM Leaderboard on Two Gaming GPUs

Comments

Show HN: How I Topped the HuggingFace Open LLM Leaderboard on Two Gaming GPUs

Show HN: DD Photos – open-source photo album site generator (Go and SvelteKit)

Show HN: Remotely use my guitar tuner

Show HN: A playable version of the Claude Code Terraform destroy incident

Show HN: Find Engineering Manager Jobs Efficiently

Show HN: Get AI to write code that it can read

Show HN: DenchClaw – Local CRM on Top of OpenClaw

SHOW HN: A usage circuit breaker for Cloudflare Workers

Show HN: Smux – Terminal Multiplexer built for AI agents

Show HN: Local-first firmware analyzer using WebAssembly

Show HN: The Mog Programming Language

Show HN: AI agent that runs real browser workflows

Show HN: I Was Here – Draw on street view, others can find your drawings

Show HN: VS Code Agent Kanban: Task Management for the AI-Assisted Developer

Show HN: Hopalong Attractor. An old classic with a new perspective in 3D

Show HN: Hotwire Club – A Learning Community for Hotwire (Turbo/Stimulus/Rails)

Show HN: Skir – like Protocol Buffer but better

Show HN: I wrote an application to help me practice speaking slower

Show HN: I built a real-time OSINT dashboard pulling 15 live global feeds

Show HN: I built a site where strangers leave kind voice notes for each other

Show HN: Latchup – Competitive programming for hardware description languages

Show HN: Eyot, A programming language where the GPU is just another thread

Show HN: Zenòdot – Find if a book has been translated into your language

Show HN: sAT Protocol – static social networking

Show HN: AI matchmaking from open ended dating profiles

Show HN: Husky hook that blocks Git push until you do your pushups

Show HN: Curiosity – DIY 6" Newtonian Reflector Telescope

Show HN: I gave my robot physical memory – it stopped repeating mistakes

Show HN: WolfStack – Proxmox-like server management in a single Rust binary

Show HN: Reviving a 20-year-old puzzle game Chromatron with Ghidra and AI