frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Reverse-engineering retrieval in decoder-only Transformers

https://github.com/tmaselko/paper-attncap
2•tmaselko•1h ago

Comments

tmaselko•1h ago
I wanted to pick attention's head dimensions based on something other than vibes, so I reverse-engineered retrieval. I adapted MQAR into TSAR, "Tuple-Structured Associative Recall". Sequence positions become tuples with complete semantic meanings, letting me completely remove positional confounds. What I found was news to me, so I tidied it up, wrote it down, and put it in a repo.

In summary: Without positional confounds, Transformers are a powerhouse at retrieval. Length generalization is effortless. At or above 2, head dimension does not limit retrieval capacity at all. Retrieval is geometry-driven and contains three mechanisms: separation (of hidden state geometry into a dense spherical code), projection (of the code from the hidden state), and amplification (to sharpen/saturate softmax).

Some other fun implications:

- Models can represent features in dense spherical codes, not just orthogonal axes or superpositions.

- Retrieval heads appear to cripple their own gradients upon formation.

- Mainstream positional encodings aren't designed with retrieval in mind, and are antagonistic to it. Followup experiments hint that simply including a PE is catastrophic for retrieval.

- Length generalization failures should be mostly PEs warping the learned code so separations become alignments and alignments become separations.

- "Out-of-distribution" can be seen as "never accounted for in the spherical code". If it hasn't been seen it cannot be separated, and if it hasn't been separated it cannot be distinguished.

Preprint here: https://zenodo.org/records/19359748 (Still fishing for an arXiv endorsement...)

Github repo here: https://github.com/tmaselko/paper-attncap

You can replicate the headline results in five minutes on a 4090, or the whole paper in 20-30 hours if so inclined.

I'd be happy to answer any questions, I'm kinda starved for feedback on this.

Improving citation accuracy in GenAI with agentic highlight tool for local files

https://old.reddit.com/r/LLMDevs/comments/1sfd6ga/annotation_update_just_pushed_improved_note/
1•ieuanking•1m ago•0 comments

Next Grok model training with 10T parameter model

https://twitter.com/i/status/2041754402239975479
1•ramshanker•1m ago•1 comments

Bonsai 8B: a 1-bit LLM that fits in 1.15GB

https://firethering.com/bonsai-8b-1bit-llm/
2•steveharing1•2m ago•0 comments

AI agents as CRDT peers – building collaborative AI with Yjs

https://electric-sql.com/blog/2026/04/08/ai-agents-as-crdt-peers-with-yjs
1•samwillis•3m ago•0 comments

Confidential Inference

https://confidentialinference.net/
1•rzk•3m ago•0 comments

OneLivePage

https://www.onelive.page/
1•erii•3m ago•1 comments

A New Jersey Teen Finds Treasure, and More, in Abandoned Storage Units

https://www.nytimes.com/2026/03/31/style/new-jersey-teen-storage-units.html
2•bookofjoe•4m ago•1 comments

Taskmaster

1•mangoshakeboss•5m ago•0 comments

Show HN: I quit my job to sell garlic online

https://kylebenzle.com/demeter
1•WWIII_Historian•6m ago•0 comments

Browser, editor, and terminal. One app

https://glassapp.dev
1•mooreds•6m ago•0 comments

Show HN: md.page – Your agent writes Markdown, you get a URL

https://github.com/maypaz/md.page
1•maypaz•6m ago•0 comments

Becoming Chief Technology Officer Wasn't a Promotion, It Was a Response

https://carmenh.dev/2026/04/08/becoming-a-chief-technology-officer/
1•mooreds•7m ago•0 comments

LLM-Kasten: a structured, persistent MD wiki CLI for agents

https://github.com/jordan-gibbs/llm-kasten
1•jordan_gibbs•7m ago•1 comments

Teen Basketball Is for Pros

https://www.cnn.com/2026/04/08/sport/high-school-basketball-nil-king-bacot-cec
4•mooreds•10m ago•0 comments

Ask HN: Why is email verification still treated as a separate workflow?

1•dimplemailgreet•10m ago•1 comments

lmcli: Sleek and minimal terminal agentic coding

https://codeberg.org/mlow/lmcli
1•wolttam•10m ago•1 comments

My Gratitude Jar – a gratitude journaling app to help remember the good times

https://play.google.com/store/apps/details?id=com.mygratitudejar.app&hl=en_US
2•YP_Rabs•10m ago•0 comments

Give LLMs a Thinking Medium

https://github.com/danieltanfh95/replsh
1•danieltanfh95•11m ago•0 comments

Cloud Networking Compared

https://adstuart.github.io/cloud-networking-compared/
2•mariuz•13m ago•1 comments

What a Japanese cooking principle taught me about overcoming AI fatigue

https://www.devas.life/what-a-japanese-cooking-principle-taught-me-about-overcoming-ai-fatigue/
2•philips•13m ago•0 comments

The Rust CLI tool that sped up our test suite by 6x

https://imbue.com/product/offload-how-it-works/
2•nvader•13m ago•0 comments

Show HN: stagewise: The coding agent built for the web - OSS [video]

https://www.youtube.com/watch?v=hmICdmqBKc0
2•glenntws•13m ago•0 comments

Google Workspace's Security Warning Was Just a Sales Pitch

https://blog.yaelwrites.com/googlew-s-security-warning-was-actually-a-sales-pitch/
2•yaelwrites•14m ago•0 comments

Show HN: ArcadeDB Academy – 6 Free Database Courses with Certification

https://arcadedb.com/academy.html
2•lvca•14m ago•0 comments

Show HN: Open-Source AI That Builds Screens, Not Just Text

https://github.com/SimonSchubert/Kai
2•arschibald•15m ago•0 comments

Muse Spark: Scaling Towards Personal Superintelligence

https://ai.meta.com/blog/introducing-model-meta-superintelligence-labs/?_fb_noscript=1
2•zielmicha•15m ago•0 comments

Show HN: I built a personal corporation of AI agents that runs on your PC

https://github.com/re-marked/claude-corp
2•re-marked•16m ago•0 comments

Process Knowledge

https://pluralistic.net/2026/04/08/process-knowledge-vs-bosses/
2•hn_acker•17m ago•0 comments

Data in Use Protection: How MPC Keeps Inputs Hidden from the Cloud

https://stoffelmpc.com/stoffel-blog/mpc-data-in-use
2•badcryptobitch•18m ago•0 comments

Digital Hopes, Real Power: How the Arab Spring Fueled a Global Surveillance Boom

https://www.eff.org/deeplinks/2026/04/digital-hopes-real-power-how-arab-spring-fueled-global-surv...
4•hn_acker•19m ago•0 comments