frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Towards understanding multiple attention sinks in LLMs

https://github.com/JeffreyWong20/Secondary-Attention-Sinks
1•thw20•2h ago

Comments

thw20•2h ago
This project reveals an interesting phenomena, where LLM converts semantic non-informative tokens to attention sinks through middle layer MLP.

The converted sinks are termed secondary attention sinks as they are weaker then BOS attention sinks.

This might be related to layer specialisation in LLM!