frontpage.

Transformers are like great eyes, while Recurrent models are like a stomach

1•MrPan•2w ago

I’ve been training two small models on a classic long novel (Hongloumeng, 2.6M bytes) to see how they "learn" a story. After a few days of watching the logs, I noticed something interesting about where Transformers struggle.

The "Goldfish" Problem The Transformer is incredibly fast at learning how to finish a sentence. Because it uses "Attention," it’s like a student with perfect short-term memory. But it is "blind" to the long run. It only sees 128 characters at a time. It has no way to remember the beginning of the book while it's reading the end.

The Crossover My "Infinite Brain" model (a recurrent architecture) started out much worse. It was confused and the output was garbage. But around the 5th time reading the book, it "crossed over" and started beating the Transformer.

Because the Brain carries a small "memory state" forward forever, it eventually builds a "vibe" of the whole book that the Transformer just can't see.

What I learned:

Transformers are like great eyes: They see the immediate details perfectly.

Recurrent models are like a stomach: They digest the whole thing slowly, but they keep the "nutrients" of the story for much longer.

It’s a small toy experiment, but it reminded me that while Attention is powerful, having a persistent "soul" or memory state still matters for long-form data.

......

https://github.com/MrPan2048/GeometricTransformer

Portable C Compiler

Show HN: Kokki – A "Dual-Core" System Prompt to Reduce LLM Hallucinations

Software Engineering Transformation 2026

Microsoft purges Win11 printer drivers, devices on borrowed time

Lunch with the FT: Tarek Mansour

Old Mexico and her lost provinces (1883)

'AI' is a dick move, redux

The source code was the moat. But not anymore

Does anyone else feel like their inbox has become their job?

An AI model that can read and diagnose a brain MRI in seconds

Dev with 5 of experience switched to Rails, what should I be careful about?

AlphaFace: High Fidelity and Real-Time Face Swapper Robust to Facial Pose

Scientists discover “levitating” time crystals that you can hold in your hand

Rammstein – Deutschland (C64 Cover, Real SID, 8-bit – 2019) [video]

Tell HN: Yet Another Round of Zendesk Spam

Postgres Message Queue (PGMQ)

Show HN: Django-rclone: Database and media backups for Django, powered by rclone

NY lawmakers proposed statewide data center moratorium

OpenClaw AI chatbots are running amok – these scientists are listening in

Show HN: AI agent forgets user preferences every session. This fixes it

Introduce the Vouch/Denouncement Contribution Model

Show HN: SSHcode – Always-On Claude Code/OpenCode over Tailscale and Hetzner

Microsoft appointed a quality czar. He has no direct reports and no budget

Multi-agent coordination on Claude Code: 8 production pain points and patterns

Washington Post CEO Will Lewis Steps Down After Stormy Tenure

DevXT – Building the Future with AI That Acts

A Minimal OpenClaw Built with the OpenCode SDK

The silent death of Good Code

The Internal Negotiation You Have When Your Heart Rate Gets Uncomfortable

Show HN: Glance – Fast CSV inspection for the terminal (SIMD-accelerated)