frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Modular Manifolds

https://thinkingmachines.ai/blog/modular-manifolds/
66•babelfish•1h ago

Comments

jasonjmcghee•1h ago
The learning rates they demonstrate are crazy - though the standard when talking about CIFAR-10 is 94% accuracy iirc. Showing ~60% accuracy is weird.

Has DAWNBench been done with manifold Muon (with a more appropriate architecture)?

snake_doc•1h ago
Um.. the model is tiny: https://github.com/thinking-machines-lab/manifolds/blob/main...
jasonjmcghee•20m ago
Yeah, it's just the wrong architecture for the job, so I found it to be a strange example.

Here's the top model on DAWNBench - https://github.com/apple/ml-cifar-10-faster/blob/main/fast_c...

Trains for 15 epochs and it, like all the others is a 9 layer resnet.

srean•2m ago
Usually there's more to a ML, data-science idea (that's not a full fledged fledged out journal paper) than beating a SOTA benchmark.

In fact beating SOTA is often the least interesting part 〽 f an interesting paper and the SOTA-blind reviewers often use it as a gatekeeping device.

Jackson__•1h ago
They say they train for ~3 epochs. Could it be that's just not long enough of a training run? I have no idea how many epochs are usually used in those models.
pooooooooooooop•19m ago
its a 3-layer MLP as stated in the article
snake_doc•1h ago
Hmmm… http://www.incompleteideas.net/IncIdeas/BitterLesson.html
whimsicalism•1h ago
this is a bad example to claim the bitter lesson applies to, it’s about the fundamentals of optimization techniques not about tying to hand-crafted things for the solution space.
TimorousBestie•1h ago
Reminiscing about an old HN comment arguing that differential geometry was irrelevant to machine learning with a smile on my face.

Happy to see this opinion expressed here, too. The more math skeptics there are out there, the longer I get to keep my job. :)

deviation•42m ago
The world is full of useful shapes! No reason that math shouldn't :)
esafak•1h ago
> This post covers one appealing way to constrain the weight matrices of a neural network—by keeping the tensors constrained to submanifolds at each layer. This opens the door to re-thinking optimization, as we can co-design optimization algorithms with these manifold constraints. As an example, we propose a manifold version of the Muon optimizer whose weights are constrained to the Stiefel manifold: the manifold of matrices with unit condition number. We conclude the post by defining the idea of a modular manifold, which is a composable manifold that attempts to make it easier to scale up and train large networks.

Very good presentation. Projected gradient methods were popular during the convex optimization craze two decades ago. The ideas advanced here have precedent and seem sensible to me. My concern is whether it helps much. The test accuracy in figure 6b shows a marginal increase, a much higher learning rates, and a gentler transition to the overfitting regime, suggesting the regularization is working. The LR did not translate to a speed up: "Manifold Muon increased the wall clock time per step compared to AdamW..."

More fundamentally, I am a bit skeptical that low test accuracy is the right goal in LLMs because statistical learning theory does not adequately model the macro-behavior of very large models.

uoaei•1h ago
This is exactly the kind of out-of-the-box thinking that will get us past some of the limitations of the current crop of AI architectures. Bravo to the authors.
SubiculumCode•48m ago
Curious why the authors chose the blog format over a research report?
almostgotcaught•31m ago
you mean a paper? because it's not paper quality content?
pooooooooooooop•20m ago
thinkingmachines likes to flex
fmap•20m ago
Isn't this an old idea? E.g., here's a textbook on optimization algorithms for matrix manifolds https://press.princeton.edu/absil and here's a library that implements this in python for the Stiefel manifold that's the subject of this blog post: https://pymanopt.org/docs/stable/manifolds.html#module-pyman...

What is novel about the approach in the blog post? Serious question, I really can't tell after reading the post.

aanet•9m ago
Not here to comment on the _content_ of the blog post...

Just wanted to say the blog post design looks super nice. Beautifully laid out, very readable typography, clear graphics, approachable design with a welcoming UX, footnotes in the side, etc.

Anybody know how this is designed / styled? (I can see three.js being used, along with katex.js - but don't know more details)

Thanks

ddellacosta•7m ago
UX on the other hand...I hate it when sites hijack my key commands for moving backwards and forwards in my browser history. Please don't do this!
cs702•8m ago
I find the idea compelling, but it's too early to know if it will work well at scale, you know, in the real world.

TL;DR: The OP notes that we currently use all sorts of tricks of the trade, including applying normalization layers, to keep unit values in DNNs from getting too large or too small when we train them. Keeping unit values from getting too large or small prevents numerical underflow/overflow, and also helps speed up learning by keeping the magnitudes of updates small in relation to weights. The OP proposes that we should constrain weights to be in sub-manifolds with unit condition number[a] at each layer, and that we should modify/design SGD algorithms to work well within those manifolds.

--

[a] https://en.wikipedia.org/wiki/Condition_number

--

EDIT: On the other hand, yesterday I saw a paper about doing basically the opposite, letting unit values in DNNs get as big or small as they need to get... by mapping them to complex logarithms and keeping them in that domain: https://openreview.net/forum?id=SUuzb0SOGu . I also found this opposite idea oddly compelling, but again, I don't know how well it works, because it hasn't been tested in real applications.

robots0only•6m ago
so their way to differentiate against frontier labs is to try writing research blog posts (not papers). It will be interesting to see how this plays out. I don't think that anyone serious about developing frontier models would be putting anything useful out there for others. We already see this with all the incumbents -- Google, OAI, Anthropic, xAI, DeepSeek and other chinese labs.

Mathematical Patterns in Phone Numbers

https://barish.me/blog/mathematical-phone-numbers/
1•toonewbie•2m ago•0 comments

Teaching LLMs to spell with token healing

https://blog.sweep.dev/posts/token-healing-autocomplete
1•williamzeng0•2m ago•0 comments

Corporate America Is Caving to Trump, Not Just Because of a Lack of Backbone

https://www.nytimes.com/2025/09/26/business/trump-disney-paramount-shareholder-capitalism.html
1•ripe•3m ago•0 comments

Arete Systems 1000 – Computer Ads from the Past

https://computeradsfromthepast.substack.com/p/arete-systems-1000
1•rbanffy•5m ago•0 comments

Why Early-Stage Founders Should Consider Skipping Prior Art Searches for Patents

https://ideaclerk.com/blog/why-early-stage-founders-should-consider-skipping-prior-art-searches
1•ian_schick•5m ago•0 comments

Trump Clears Way for Cronies to Buy TikTok for $14B

https://daringfireball.net/linked/2025/09/25/trump-tiktok
3•alwillis•8m ago•0 comments

Chrome DevTools MCP

https://developer.chrome.com/blog/chrome-devtools-mcp
1•zora_goron•8m ago•0 comments

We Got to See Snapdragon X2 Elite PCs in Action and They Look Impressive

https://hothardware.com/news/we-got-to-see-snapdragon-x2-elite-in-action-and-it-looks-impressive
1•rbanffy•9m ago•0 comments

Emergency Software: Software Development Lessons from Emisari

https://ztoz.blog/posts/emisari/
1•jwstarr•9m ago•0 comments

Retail Stores May Soon Use Drones to Chase Thieves

https://gizmodo.com/flock-safety-retail-theft-drones-2000664310
1•mikece•12m ago•0 comments

Goodbye petrostates, hello 'electrostates': clean energy shift reshaping world

https://theconversation.com/goodbye-petrostates-hello-electrostates-how-the-clean-energy-shift-is...
2•gnabgib•12m ago•0 comments

We still chose C++ (instead of Rust) for new database development

https://www.eloqdata.com/blog/2024/10/26/why-cpp
1•the_precipitate•12m ago•0 comments

Fungus-farming termites control weeds – Science – AAAS

https://www.science.org/content/article/how-fungus-farming-termites-control-weeds
1•rbanffy•13m ago•0 comments

Do Patents Help Startups Raise Funding? Evidence from the U.S. and Europe

https://ideaclerk.com/blog/do-patents-really-help-startups-raise-funding-evidence-from-the-u-s-an...
1•ian_schick•13m ago•0 comments

Videogame Giant Electronic Arts Near Roughly $50B Deal to Go Private

https://www.wsj.com/business/deals/ea-private-deal-buyout-video-game-maker-808aefec
3•kgwgk•13m ago•0 comments

Agentic AI as Unlimited Junior Analysts

https://substack.com/inbox/post/174635503
1•mathattack•14m ago•0 comments

How Does Lossless Compression in Fuji RAF Files Work?

https://capnfabs.net/posts/fuji-raf-compression-algorithm/
1•dsego•15m ago•0 comments

Breakthrough carbon nanotube material sets new thermal insulation record

https://phys.org/news/2025-09-breakthrough-carbon-nanotube-material-thermal.html
1•PaulHoule•16m ago•0 comments

We committed to a zero-bugs policy

https://linear.app/now/zero-bugs-policy
2•Timothee•16m ago•0 comments

Gunman in shooting at NFL headquarters had CTE: Medical examiner

https://abcnews.go.com/US/shane-tamura-gunman-shooting-nfl-headquarters-cte-medical/story?id=1259...
2•geox•19m ago•0 comments

Question

1•Guanqunmu•19m ago•1 comments

Money manager Howard Rubin arrested on sex trafficking charges

https://www.cnbc.com/2025/09/26/howard-rubin-sex-trafficking-new-york-investment.html
1•kamaraju•20m ago•1 comments

Anxiety, AI Adoption and More Anxiety

https://errantscience.com/blog/2025/09/17/anxiety-ai-adoption-and-more-anxiety/
1•speckx•20m ago•0 comments

What You Need to Know about Modern CSS (2025 Edition)

https://frontendmasters.com/blog/what-you-need-to-know-about-modern-css-2025-edition/
1•FromTheArchives•22m ago•0 comments

US autism research gets $50M funding boost – amid row over Tylenol

https://www.nature.com/articles/d41586-025-03126-0
2•rntn•23m ago•0 comments

Profiling multimodal workloads: lessons from Daft

https://www.daft.ai/blog/daft-observability-tools
1•DISCURSIVE•24m ago•0 comments

Malmö Faces Dilemma with 2k Untraceable Nutella Jars

https://swedenherald.com/article/malmo-faces-dilemma-with-2000-untraceable-nutella-jars
1•MaximilianEmel•25m ago•0 comments

Implementing a Kalman Filter in Postgres

https://neon.com/blog/implementing-a-kalman-filter-in-postgres-to-smooth-gps-data
2•carlotasoto•25m ago•0 comments

Latest argument against regulating AI: that would be the Antichrist

https://www.theverge.com/ai-artificial-intelligence/785407/peter-thiel-antichrist-tech-regulation
1•maltalex•26m ago•0 comments

Doug Bowser Bids Farewell to the Mushroom Kingdom

https://www.businesswire.com/news/home/20250925384737/en/Doug-Bowser-Bids-Farewell-to-the-Mushroo...
2•haunter•26m ago•1 comments