frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

The Tradeoffs of SSMs and Transformers

https://goombalab.github.io/blog/2025/tradeoffs/
37•jxmorris12•6h ago

Comments

macleginn•4h ago
The part on tokenisation is not very convincing. Replacing BPE with characters or even bytes will not "remove tokenisation" -- atoms will still be tokens, relating to different things in different cultures/writing traditions (a "Chinese byte" is a part of a Chinese character; an "English byte" is basicaly a letter or a number) and not relating to something fundamentally linguistic. BPE can be thought of as another way of representing linguistic sequences with symbols of some kind; it provides less inductive bias into the use of language, but it is not perhaps categorically different from any kind of writing.
Herring•4h ago
I'm a bit bearish on SSMs (and hybrid SSM/transformers) because the leading open weight models (DeepSeek, Qwen, Gemma, Llama) are all transformers. There's just no way none of them tried SSMs.
visarga•4h ago
Yes, until serious adoption I am reserved too, both on SSMs and diffusion based LLMs.
nextos•3h ago
Second-generation LSTMs (xLSTM) do have leading performance on zero-shot time series forecasting: https://arxiv.org/abs/2505.23719.

I think other architectures, aside from the transformer, might lead to SOTA performance, but they remain a bit unexplored.

programjames•2h ago
I mean, everyone is still using variational autoencoders for their latent flow models instead of the information bottleneck. It's because it's cheaper (in founder time) to raise 10(0)x more money instead of having to design your own algorithms and architectures for a novel idea that might work in theory, but could be a dead end six months down the line. Just look at LiquidAI. Brilliant idea, but it took them ~5 years to do all the research and another to get their first models to market... which don't yet seem to be any better than models with a similar compute requirement. I find it pretty plausible that none of the "big" LLM companies seriously tried SSMs, because they already have plenty enough money to throw at transformers, or took a quick path to get a big valuation.
mbowcut2•1h ago
I think I agree with you. My only rebuttal would be it's this kind of thinking that's kept any leading players form trying other architectures in the first place. As far as I know, SOTA for SSM's just doesn't suggest significant enough potential upsides warrant significant R&D. Not compared to the tried and true established LLM methods. The decision might be something like: "Pay X to train a competitive LLM" vs "Pay 2X to MAYBE train a competitive SSM".

Supabase MCP can leak your entire SQL database

https://www.generalanalysis.com/blog/supabase-mcp-blog
512•rexpository•7h ago•269 comments

Bootstrapping a side project into a profitable seven-figure business

https://projectionlab.com/blog/we-reached-1m-arr-with-zero-funding
230•jonkuipers•1d ago•42 comments

Breaking Git with a carriage return and cloning RCE

https://dgl.cx/2025/07/git-clone-submodule-cve-2025-48384
262•dgl•7h ago•90 comments

Rules of good writing (2007)

https://dilbertblog.typepad.com/the_dilbert_blog/2007/06/the_day_you_bec.html
36•santiviquez•1d ago•25 comments

Smollm3: Smol, multilingual, long-context reasoner LLM

https://huggingface.co/blog/smollm3
225•kashifr•9h ago•41 comments

Radium Music Editor

http://users.notam02.no/~kjetism/radium/
146•ofalkaed•7h ago•27 comments

Xenharmlib: A music theory library that supports non-western harmonic systems

https://xenharmlib.readthedocs.io/en/latest/
24•retooth•2h ago•0 comments

OLMo – a fully open LLM outperforming GPT 4o mini

https://allenai.org/olmo
4•oldfuture•34m ago•0 comments

Bulgaria to join euro area on 1 January 2026

https://www.ecb.europa.eu//press/pr/date/2025/html/ecb.pr250708~b9676a9fa8.en.html
28•toomuchtodo•48m ago•2 comments

Brut: A New Web Framework for Ruby

https://naildrivin5.com/blog/2025/07/08/brut-a-new-web-framework-for-ruby.html
119•onnnon•7h ago•47 comments

Dynamical origin of Theia, the last giant impactor on Earth

https://arxiv.org/abs/2507.01826
63•bikenaga•7h ago•20 comments

Plants monitor the integrity of their barrier by sensing gas diffusion

https://www.nature.com/articles/s41586-025-09223-4
56•Bluestein•3d ago•24 comments

Taking over 60k spyware user accounts with SQL injection

https://ericdaigle.ca/posts/taking-over-60k-spyware-user-accounts/
160•mtlynch•5d ago•52 comments

Frame of preference A history of Mac settings, 1984–2004

https://aresluna.org/frame-of-preference/
3•K7PJP•1h ago•0 comments

Show HN: OffChess – Offline chess puzzles app

https://offchess.com
297•avadhesh18•16h ago•117 comments

Can an email go 500 miles in 2025?

https://flak.tedunangst.com/post/can-an-email-go-500-miles-in-2025
264•zdw•4d ago•101 comments

New Horizons images enable first test of interstellar navigation

https://www.newscientist.com/article/2486823-new-horizons-images-enable-first-test-of-interstellar-navigation/
6•jnord•2d ago•0 comments

GlobalFoundries to Acquire MIPS

https://mips.com/press-releases/gf-mips/
160•mshockwave•8h ago•107 comments

Ceramic: A cross-platform and open-source 2D framework in Haxe

https://ceramic-engine.com/
55•-yukari•3d ago•4 comments

Show HN: A rain Pomodoro with brown noise, ASMR, and Middle Eastern music

https://forgetoolz.com/rain-pomodoro
50•ShadowUnknown•7h ago•27 comments

The Tradeoffs of SSMs and Transformers

https://goombalab.github.io/blog/2025/tradeoffs/
37•jxmorris12•6h ago•6 comments

Blind to Disruption – The CEOs Who Missed the Future

https://steveblank.com/2025/07/08/blind-to-disruption-the-ceos-who-missed-the-future/
77•ArmageddonIt•11h ago•90 comments

Show HN: Jukebox – Free, Open Source Group Playlist with Fair Queueing

https://www.jukeboxhq.com/
96•skeptrune•10h ago•37 comments

SVGs that feel like GIFs

https://koaning.io/posts/svg-gifs/
394•cantdutchthis•17h ago•102 comments

New sphere-packing record stems from an unexpected source

https://www.quantamagazine.org/new-sphere-packing-record-stems-from-an-unexpected-source-20250707/
410•pseudolus•1d ago•207 comments

On The Meaning of Ritual

https://alicemaz.substack.com/p/on-the-meaning-of-ritual
61•jger15•3d ago•55 comments

Particle Lenia Deluxe Edition

https://www.craftlinks.art/Notebook/particle-lenia/
29•CraftingLinks•3d ago•5 comments

Inertial forces (indirect terms) in problems with a central body

https://astro.theoj.org/article/141682-on-inertial-forces-indirect-terms-in-problems-with-a-central-body
10•raattgift•3d ago•0 comments

Mercury: Ultra-fast language models based on diffusion

https://arxiv.org/abs/2506.17298
554•PaulHoule•1d ago•229 comments

I used o3 to profile myself from my saved Pocket links

https://noperator.dev/posts/o3-pocket-profile/
501•noperator•1d ago•191 comments