frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

ARCHE3-7B – Sparse Moe with SmartRouter and Foundation Curriculum Training

1•OpenSynapseLabs•1h ago
This is my first post on HN — a bit nervous, but excited to share what I've been building.

I’ve been working on a 7B sparse Mixture-of-Experts prototype that can actually run on consumer hardware. For example, on a Colab T4 it uses around 5 GB RAM and 5 GB VRAM during training, and roughly 3.5–5 GB for inference.

A couple of things I spent a lot of time on:

Routing (SmartRouter) I tried to tackle routing collapse in a practical way. Instead of letting all tokens dump into a few "favorite" experts, I combined a few things: load balancing loss, an entropy bonus to keep the distribution flat, jitter noise during training, and a learnable temperature. It works surprisingly well at keeping a good portion of experts active. I’ve open-sourced the router code (hive_router.py) if anyone wants to look at the math or grab it for their project.

Foundation Curriculum Training (FCT) Before standard pretraining, I run the model through structured reasoning patterns — currently 290 of them across 14 cognitive domains. Each pattern follows a strict sequence: OBSERVE → PRIOR → UPDATE → RIPPLE → ANALOGY → ACT.

To make this actually run on my setup, I'm doing a couple of specific tricks. First, I use a Target-Only Loss (masking out the tags and inputs and only calculating gradients on the actual reasoning payloads like UPDATE or ACT). Second, I had to write a custom SparseExpertAdamW that only instantiates optimizer states for the experts that are actually active on that step. Without this, the optimizer states for 20,480 experts would have absolutely crushed my RAM.

So far I’ve completed 5 out of 14 domains. One cool thing: every new domain starts with a lower loss than the previous one (for example, the Systems domain went from 2.149 down to 0.941), so it seems like the cross-domain transfer is actually happening.

The architecture in short:

d_model = 2048

10 layers (5 Dense Core + 5 Fusion)

20,480 experts (8 domains × 2560)

Dynamic Top-K (2–4)

memory-mapped weights + Dopamine Learning v1

Model is up on HuggingFace: https://huggingface.co/OpenSynapseLabs/arche3-7b And I put the benchmarks & graphs on GitHub: https://github.com/OpenSynapseLabs/arche3-benchmarks

Limitations (to be honest): I haven’t run standard benchmarks yet (MMLU, GSM8K, HumanEval), only 5/14 FCT domains are done, and the dataset is still small and needs proper scaling. Plus, this is a solo project so far. I did use Gemini and Claude to speed up parts of the implementation, but the architecture and core ideas are my own.

I’d really appreciate any feedback, especially if you’re into routing in MoE models, curriculum pretraining, or scaling this further (thinking about 35B next).

My main goal is to build systems that amplify human thinking, not replace it. If that sounds like something you'd want to mess around with or contribute to, feel free to reach out at opensynapselabs@proton.me. I'm happy to share more details and the private repo.

Thanks for reading!

Embarrassingly Simple Self-Distillation Improves Code Generation

https://arxiv.org/abs/2604.01193
1•jryio•1m ago•0 comments

Britain's Free Speech Crisis and the Bill That Would Fix It

https://reclaimthenet.org/britains-free-speech-crisis-and-the-bill-that-would-fix-it
1•uyzstvqs•1m ago•0 comments

Tech Companies Are Trying to Neuter Colorado's Landmark Right-to-Repair Law

https://www.wired.com/story/tech-companies-are-trying-to-neuter-colorados-landmark-right-to-repai...
1•liz_ifixit•4m ago•0 comments

Don't want to pay for YouTube Premium? Morphe picks up where Revanced left off

https://www.androidauthority.com/morphe-youtube-revanced-3629859/
1•Markoff•4m ago•0 comments

Reddit is moving on from R/all

https://www.theverge.com/tech/906314/reddit-r-all-deprecating
1•efraim•4m ago•0 comments

Iran: Recruitment of child soldiers as young as 12 amounts to a war crime

https://www.amnesty.org/en/latest/news/2026/04/iran-recruitment-of-child-soldiers-as-young-as-12-...
1•mhb•5m ago•0 comments

Manuscript: A writing workspace where AI reads but never writes

https://app.manuscript.no/try
1•issaafk•5m ago•0 comments

Revive Prompt: What if the people you lose could still be known and remembered?

https://github.com/Jamessfks/revive-prompt
1•Jamessfks123•6m ago•0 comments

Five Survivors of Spectacular Falls

https://www.bbc.com/news/magazine-22934269
1•thunderbong•6m ago•0 comments

Genesis – Desktop AI with persistent memory, 11 agents. Buy once, own forever

https://genesis.bmbnexus.ai
1•bahabashatwah•8m ago•0 comments

Solana Drift Protocol drained of $285M via fake token and governance hijack

https://anonhaven.com/en/news/drift-protocol-hack-285-million-solana/
3•anonhaven•11m ago•0 comments

A case study in testing with 100 of Claude agents in parallel

https://imbue.com/product/mngr_part_2/
3•thejash•12m ago•0 comments

GitMindPro – Scan any GitHub repo for AI-injected security risks

https://gitmindpro.com
1•DevToday•12m ago•0 comments

Raytheon generalized modular toolchains for Hidden Communication Systems

https://github.com/raytheonbbn/maude-hcs
2•uticus•14m ago•1 comments

Gstack pull request – "no seriously, accept it. it fixes everything"

https://github.com/garrytan/gstack/pull/681
2•Topfi•14m ago•0 comments

Linkhut: A Social Bookmarking Site

https://ln.ht
1•Imustaskforhelp•15m ago•0 comments

Ghostty, but with Vertical tabs, lightweight and native

https://github.com/muxy-app/muxy
1•543310•16m ago•1 comments

iNaturalist

https://www.inaturalist.org/
12•bookofjoe•18m ago•1 comments

AI Governance

https://medium.com/@paul.bernard_80815/the-ai-highground-8a438dfd18c5
1•paulbernard•19m ago•0 comments

Revision Demoscene Festival 3-6 April

https://2026.revision-party.net/
2•thinkingemote•20m ago•0 comments

Half of NASA's pool of active astronauts served in the Middle East

https://www.stripes.com/theaters/us/2026-04-01/artemis-2-military-involvement-iraq-veterans-21250...
1•ThrowawayGuy1•20m ago•0 comments

Artemis II Looking Back at Earth

https://images.nasa.gov/details/art002e000191
1•DarmokJalad1701•21m ago•0 comments

Getting Claude to QA its own work

https://www.skyvern.com/blog/getting-claude-to-qa-its-own-work/
4•suchintan•21m ago•0 comments

Are web apps really slower than native?

https://atfzl.com/are-web-apps-really-slower-than-native/
2•atfzl•24m ago•0 comments

A Visual Guide to Gemma 4

https://newsletter.maartengrootendorst.com/p/a-visual-guide-to-gemma-4
1•mariuz•24m ago•0 comments

Code-review-graphv 2.1.0, 8× fewer tokens for code reviews via structural graph

https://github.com/tirth8205/code-review-graph
1•tirthkanani•26m ago•0 comments

Pharmaceuticals face 100% tariffs in US – unless firms strike a deal

https://www.bbc.com/news/articles/cx29kke01gpo
12•geox•29m ago•9 comments

Ask HN: Cool Websites to Stop Doomscrolling?

2•karakoram•29m ago•2 comments

Artemis II astronaut: 'I have two Microsoft Outlooks, and neither are working'

https://www.theregister.com/2026/04/02/artemis_astronauts_microsoft_outlook_broken/
5•Bender•32m ago•1 comments

AI-Generated Interview with One Piece Actor Published by Esquire

https://kotaku.com/one-piece-netflix-live-action-mackenyu-interview-esquire-ai-singapore-2000684648
2•pavel_lishin•32m ago•0 comments