frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: How I Topped the HuggingFace Open LLM Leaderboard on Two Gaming GPUs

https://dnhkng.github.io/posts/rys/
11•dnhkng•1h ago

Comments

dnhkng•1h ago
Author here. I found that duplicating a specific block of 7 middle layers in Qwen2-72B, without modifying any weights, improved performance across all Open LLM Leaderboard benchmarks and took #1. As of 2026, the top 4 models on that leaderboard are still descendants.

The weird finding: single-layer duplication does nothing. Too few layers, nothing. Too many, it gets worse. Only circuit-sized blocks of ~7 layers work. This suggests pretraining carves out discrete functional circuits in the layer stack that only work when preserved whole.

The whole thing was developed on 2x RTX 4090s in my basement. I'm now running current models (GLM-4.7, Qwen3.5, MiniMax M2.5) on a dual GH200 rig (see my other post). Code and new models coming soon.

Happy to answer questions.

rapatel0•2m ago
I think you may have cracked latent space reasoning. I've had a hunch that something like this would work, but couldn't figure out how the training would back propagate. But you've shown that you just need to duplicate existing layers.

Have you tried a simple inline loop over the duplicated layers? Would be interesting to see performance. Also, would be interesting to compare with a MOE model. See if these layers are acting like different agreeing "experts" or if there is reasoning happening in the latent space.

blourvim•1h ago
I am not really an ml dev so I don't understand most of it. It does sound ridiculous how it would even work work. Brilliant work and great article I enjoyed reading it

This sounds similar to the Kimi's mixture of experts architecture if I understood it correctly(likely I have not), can you comment on this ?

tgw43279w•54m ago
That was a fun read! The base64 decoding and encoding is quite interesting. A parallel: these models are surprisingly robust to heavy word mangling, back in 2023 people used this trick to jailbreak the models very often, but what was more surprising is that they even understand it. I always thought of it this way there must be some circuitry in the model that maps these almost unrecognizable words/sentences into their rectified versions. But what your base64 also shows is the fact thy can also encode them back as well! (However models are known to not be able to produce mangled output that looks convincingly random. I think the base64 transformation is more mechanical in this regard and hence it‘s easier to do the reverse for them.) So your layer circuit hypothesis aligns pretty well with my mental model of how these models work based on the interpretability work I am familiar with! I really also like the way you used the heatmaps as a tool to derive layer insights, very intuitive! But it’s really surprising that you can simply duplicate layers and achieve better results that generalize! This is some research grade effort! I’m confident you could publish this in NeurIPS or ICML if you put it into a paper! I‘m quite impressed! Great work!

Tools I found that make using Claude Code easier on your phone

https://zilliz.com/blog/3-easiest-ways-to-use-claude-code-on-your-mobile-phone
1•Fendy•14s ago•0 comments

Show HN: Svglib a SVG parser and renderer for Windows

https://github.com/bibhas2/svglib
1•leopoldj•2m ago•0 comments

The ugly history of regime change

https://www.profgmedia.com/p/this-time-is-different
2•shimm723•3m ago•0 comments

What software knowledge will stay relevant?

https://www.natemeyvis.com/what-software-knowledge-will-stay-relevant/
1•speckx•4m ago•0 comments

Show HN: Base Layer – Open-source behavioral compression from any text

https://www.base-layer.ai/
1•agulaya24•4m ago•0 comments

Para-biathlete wins silver using ChatGPT as his coach

https://www.theguardian.com/sport/2026/mar/09/ukraine-winter-paralympics-chat-gpt-artificial-inte...
1•defly•5m ago•0 comments

Amazon is holding a mandatory meeting about AI breaking its systems

https://twitter.com/lukolejnik/status/2031257644724342957
2•lwhsiao•5m ago•0 comments

Show HN: Claude Tuner – Monitor your Claude usage and find the right plan

https://claudetuner.com
1•xlos21•7m ago•1 comments

CragCLI – a new calculator for the command line

https://cragcli.info
3•librasteve•7m ago•1 comments

Show HN: Jottit – Reviving the Original from 2007

https://jottit.org
1•simonbc•7m ago•0 comments

Stripe: Billing for LLM Tokens

https://docs.stripe.com/billing/token-billing
1•tosh•7m ago•0 comments

Unlocked SaaS, file source as truth?

1•abmmgb•8m ago•1 comments

Understanding OBD2 codes (past, present, future)

https://crewchief.cc/blog/understanding-obd2-codes
1•meandave•8m ago•0 comments

Ask HN: What Happened to Llama Models?

1•elpakal•8m ago•0 comments

Meta to Acquire Moltbook

https://www.bloomberg.com/news/articles/2026-03-10/meta-to-acquire-moltbook-viral-social-network-...
2•marc__1•9m ago•0 comments

Disorder Drives One of Nature's Most Complex Machines

https://www.quantamagazine.org/disorder-drives-one-of-natures-most-complex-machines-20260309/
2•Brajeshwar•12m ago•0 comments

Spacecraft's impact changed asteroid's orbit in a save-the-Earth test

https://apnews.com/article/asteroid-nasa-draft-dimorphos-9abccd32d4cb532a66249dd6145685cb
2•Brajeshwar•13m ago•0 comments

Volkswagen to cut 50k jobs as profits drop

https://www.bbc.com/news/articles/c4gqyyly9v8o
1•gehwartzen•13m ago•0 comments

Microsoft 365 confirms new premium tier, stuffed with AI and few discounts

https://www.theregister.com/2026/03/09/microsoft_adds_a_premium_tier/
1•Brajeshwar•13m ago•0 comments

Smol AI WorldCup: What Small LLMs Can Do

https://huggingface.co/blog/FINAL-Bench/smol-worldcup
3•seawolf2357•13m ago•0 comments

Debian decides not to decide on AI-generated contributions

https://lwn.net/SubscriberLink/1061544/125f911834966dd0/
11•jwilk•13m ago•1 comments

License Laundering and the Death of Clean Room (The Chardet Saga)

https://shiftmag.dev/license-laundering-and-the-death-of-clean-room-8528/
1•allixsenos•13m ago•0 comments

We are building data breach machines and nobody cares

https://idealloc.me/posts/we-are-building-data-breach-machines-and-nobody-cares/
2•idealloc_haris•16m ago•0 comments

Turing Award winner and former Oxford professor Tony Hoare passed away

https://blog.computationalcomplexity.org/2026/03/tony-hoare-1934-2026.html
32•speckx•16m ago•2 comments

Non-blocking SQLite for Node.js. Ported 100% of better-sqlite3 tests

https://www.npmjs.com/package/better-sqlite3-pool
1•dilipvamsi•17m ago•1 comments

AI Agent hacked McKinsey's chatbot and gained full read-write access in 2 hours

https://www.theregister.com/2026/03/09/mckinsey_ai_chatbot_hacked/
1•smurda•17m ago•0 comments

Forward to Hell?

https://labs.ripe.net/author/mkoch/forward-to-hell-on-misusing-transparent-dns-forwarders-for-amp...
2•jruohonen•17m ago•0 comments

Elements of AI Agents

https://academy.dair.ai/courses/elements-of-ai-agents
1•omarsar•18m ago•0 comments

Portable Secret is now open source

https://blog.alcazarsec.com/tech/posts/portable-secret-is-now-opensource
1•alcazar•19m ago•0 comments

Why $100 Oil Isn't Going to Spark a New Shale Boom – Oilprice.com

https://oilprice.com/Energy/Crude-Oil/Why-100-Oil-Isnt-Going-to-Spark-a-New-Shale-Boom.html
1•bilsbie•20m ago•0 comments