frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Transformers are like great eyes, while Recurrent models are like a stomach

1•MrPan•1h ago
I’ve been training two small models on a classic long novel (Hongloumeng, 2.6M bytes) to see how they "learn" a story. After a few days of watching the logs, I noticed something interesting about where Transformers struggle.

The "Goldfish" Problem The Transformer is incredibly fast at learning how to finish a sentence. Because it uses "Attention," it’s like a student with perfect short-term memory. But it is "blind" to the long run. It only sees 128 characters at a time. It has no way to remember the beginning of the book while it's reading the end.

The Crossover My "Infinite Brain" model (a recurrent architecture) started out much worse. It was confused and the output was garbage. But around the 5th time reading the book, it "crossed over" and started beating the Transformer.

Because the Brain carries a small "memory state" forward forever, it eventually builds a "vibe" of the whole book that the Transformer just can't see.

What I learned:

Transformers are like great eyes: They see the immediate details perfectly.

Recurrent models are like a stomach: They digest the whole thing slowly, but they keep the "nutrients" of the story for much longer.

It’s a small toy experiment, but it reminded me that while Attention is powerful, having a persistent "soul" or memory state still matters for long-form data.

Loaded 2634700 bytes. Each batch chunk: 82334 Total steps to read book once: 643 Step 20 | Brain: 5.563 | Trans: 4.369 SAMPLE: 黛玉|)'Q:��t�д��T*��䎰�"��-��H� Step 40 | Brain: 5.397 | Trans: 3.469 SAMPLE: 黛玉C9j������%�����H�)���IF� Step 60 | Brain: 5.256 | Trans: 3.181 SAMPLE: 黛玉���f�XݪsʇGa���7��K��|)[o Step 80 | Brain: 5.107 | Trans: 3.015 � ����EU) 玉�����.��-n�������: Step 100 | Brain: 4.925 | Trans: 2.891 SAMPLE: 黛玉���5��X3�"��䜍��r��V:ی��;(�

......

Step 5540 | Brain: 2.057 | Trans: 2.327 SAMPLE: 黛玉,只賈王夫白說.這毫 Step 5560 | Brain: 2.079 | Trans: 2.354 SAMPLE: 黛玉了一大人,然時眾不要 Step 5580 | Brain: 2.112 | Trans: 2.381 SAMPLE: 黛玉姐的個 坐叫來.不著� Step 5600 | Brain: 2.115 | Trans: 2.394 SAMPLE: 黛玉天豥太人說䟥這撆, � Step 5620 | Brain: 2.026 | Trans: 2.290 SAMPLE: 黛玉奶歉打.命盞嚄看梅聴 Step 5640 | Brain: 2.134 | Trans: 2.414 SAMPLE: 黛玉圉起不又尚不什了,政 Step 5660 | Brain: 2.071 | Trans: 2.331 SAMPLE: 黛玉怎我古搬親就徴上一就 Step 5680 | Brain: 2.127 | Trans: 2.361 SAMPLE: 黛玉,,越于頭姐姒實眒頓 Step 5700 | Brain: 2.175 | Trans: 2.436 SAMPLE: 黛玉的你 又罜家中還太歉� Step 5720 | Brain: 2.127 | Trans: 2.348 SAMPLE: 黛玉半,他面是以. "釳有� Step 5740 | Brain: 2.141 | Trans: 2.394 SAMPLE: 黛玉,是難隄日既.阯要听 Step 5760 | Brain: 2.170 | Trans: 2.429 SAMPLE: 黛玉寶便吃,有了。”凌的 Step 5780 | Brain: 2.145 | Trans: 2.378 SAMPLE: 黛玉那好,乴日不拍說賈拿 Finished Read #9 Step 5800 | Brain: 2.092 | Trans: 2.376 SAMPLE: 黛玉人來,姐气有你兄話. Step 5820 | Brain: 2.058 | Trans: 2.321 SAMPLE: 黛玉了.政遚的�,著嬑 家

https://github.com/MrPan2048/GeometricTransformer

Smart ring maker Oura to plan tender offer at 25 discount

https://www.bloomberg.com/news/articles/2026-01-23/smart-ring-maker-oura-said-to-plan-tender-offe...
1•htrp•31s ago•0 comments

Ezh: A 21KB TypeScript-only front end framework (Demo: online Splendor game)

https://splendor.ezh.dev/
1•foreverflying•31s ago•1 comments

Show HN: Specli – compile any OpenAPI spec to an agent optimized executable

https://github.com/AndrewBarba/specli
1•andrewbarba•1m ago•0 comments

Show HN: PR Slop Stopper

https://github.com/vmazi/pr-slop-stopper
1•vmazi•1m ago•0 comments

Are You Carrying Technical Debt Without Realizing It?

https://www.techneedsolutions.com/insights/are-you-carrying-technical-debt-without-realizing-it
1•ramzez•2m ago•0 comments

How we protected MEGA Pass against clickjacking attacks

https://blog.mega.io/how-we-protected-mega-pass-against-clickjacking-attacks
2•dotcoma•3m ago•0 comments

Mass Immigration: All Arguments Ranked and Debunked [video]

https://www.youtube.com/watch?v=2CVuL-QpvQs
1•joe_mamba•4m ago•0 comments

Tracking AI's Contribution to GDP Growth

https://www.stlouisfed.org/on-the-economy/2026/jan/tracking-ai-contribution-gdp-growth
1•giuliomagnifico•4m ago•0 comments

Make It Right

https://www.stephenlewis.me/blog/make-it-right/
1•monooso•5m ago•0 comments

Mediaite Starts a Newsletter to Summarize Media Newsletters

https://www.nytimes.com/2026/01/21/business/media/mediaite-media-newsletter.html
1•bookofjoe•5m ago•1 comments

Invisible Modes on Steam and Discord Are Useless

2•xmrcat•6m ago•0 comments

Ask HN

1•ZJPhillips•7m ago•0 comments

My local news site redirects you to a third party website for ad blockers

https://www.clintmcmahon.com/Blog/ad-blocker-redirect-kstp
1•speckx•9m ago•0 comments

Fighter Mafia

https://en.wikipedia.org/wiki/Fighter_Mafia
1•azhenley•10m ago•0 comments

Show HN: A memory learning layer for AI agents to learn on the job

https://www.versanovatech.com
4•gauravsc•11m ago•0 comments

The First 30 Seconds Now Decide Everything on YouTube – YouTube Algorithm 2026

https://fluxzap.com/the-first-30-seconds-now-decide-everything-on-youtube-youtube-algorithm-2026/
1•taimurkazmi•12m ago•0 comments

Malan Chat, a full immersion language learning app for 62 languages

3•sam_osterfeld•13m ago•0 comments

Show HN: NetHackPlayer – Have Claude Play NetHack

https://github.com/pj4533/NetHackPlayer
1•pj4533•14m ago•0 comments

Show HN: Segre – Safe CLI to organize messy folders by type or date

https://www.npmjs.com/package/segre
1•shubhampawade•15m ago•0 comments

The Data Center as a Computer: Designing Warehouse-Scale Machines, Edition 4

https://play.google.com/store/books/details/The_Data_Center_as_a_Computer_Designing_Warehouse_?id...
1•tanelpoder•19m ago•1 comments

Google won't stop replacing our news headlines with terrible AI

https://www.theverge.com/tech/865168/google-says-ai-news-headlines-are-feature-not-experiment
2•cdrnsf•20m ago•0 comments

Grok floods X with sexualized images of women and children

https://counterhate.com/research/grok-floods-x-with-sexualized-images/
3•taimurkazmi•21m ago•1 comments

Parents might age faster or slower based on how many kids they have

https://www.scientificamerican.com/article/parents-might-age-faster-or-slower-based-on-how-many-k...
1•Brajeshwar•22m ago•2 comments

Plant Produces Plump, Fake Berries to Trick Birds into Spreading Its Offspring

https://www.smithsonianmag.com/smart-news/this-plant-produces-plump-fake-berries-to-trick-birds-i...
2•Brajeshwar•22m ago•0 comments

We Have No Idea How to Code. So We Got Claude to Code This Article for Us

https://www.wsj.com/tech/ai/anthropic-claude-vibe-coding-experiment-a4a3bb0f
1•stefap2•24m ago•0 comments

Drug capable of reversing spinal cord injuries is in the clinical trial

https://ensaiosclinicos.gov.br/news/546
1•dlojudice•24m ago•1 comments

Maybe my own RSS Reader could be useful for you too

https://www.whileforloop.com/en/blog/2026/01/23/feedy-rss-reader-for-e-ink/
2•lukas346•25m ago•1 comments

Best Hacker News videos normalized by number of HN users

https://sql.clickhouse.com?query=V0lUSCB5ZWFybHlfdXNlcnMgQVMgKAogICAgU0VMRUNUCiAgICAgICAgdG9ZZWFy...
1•eamag•26m ago•0 comments

Tiny falcons are helping keep the food supply safe on cherry farms

https://insideclimatenews.org/news/22012026/michigan-cherry-farms-american-kestrel-food-safety/
3•duxup•26m ago•0 comments

Pandas 3.0

https://pandas.pydata.org/community/blog/pandas-3.0.html
4•jonbaer•29m ago•0 comments