frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

Native Sparse Attention

https://aclanthology.org/2025.acl-long.1126/
120•CalmStorm•14h ago
Was submitted as "DeepSeek won the best paper award at ACL 2025"

Here is the awards page: https://cspaper.org/topic/116/record-breaking-acl-2025-crown...

Comments

CalmStorm•14h ago
For the first time, it introduced native sparse attention into the full training process, achieving up to 11× inference speedup while maintaining model performance.
sabakhoj•10h ago
> Despite being sparse, NSA surpasses Full Attention baseline on average across general benchmarks, long-context tasks, and reasoning evaluation.

Isn't it very notable that the latency improvement didn't have a performance loss? I'm not super familiar with all the technical aspects, but that seems like it should be one of the main focuses of the paper.

ethan_smith•2h ago
The performance maintenance (or even improvement) isn't surprising - sparse attention can reduce noise by focusing only on relevant tokens. Traditional full attention dilutes focus by attending to everything equally, while NSA's pruning approach mimics how humans selectively process information.
gnabgib•10h ago
Title: Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention

The awards page for ACL seems to disagree with this editorialized title: https://2025.aclweb.org/program/awards/

fourdnet•9h ago
The ACL webpage has not been updated yet. Here are the announcement slides: https://cspaper.org/topic/116/record-breaking-acl-2025-crown...
aspenmayer•8h ago
The page that the person you’re replying to does have this so it may not be updated, or they were looking in the wrong place originally, or both:

> Industry Track Awards

> Best Paper

> Speed Without Sacrifice: Fine-Tuning Language Models with Medusa and Knowledge Distillation in Travel Applications

> Daniel Zagyva, Emmanouil Stergiadis, Laurens van der Maas, Aleksandra Dokic, Eran Fainman, Ilya Gusev, Moran Beladev

Per TFA, the paper we’re looking for is this one:

> Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention

> Jingyang Yuan, Huazuo Gao, Damai Dai, Junyu Luo, Liang Zhao, Zhengyan Zhang, Zhenda Xie, Y. X. Wei, Lean Wang, Zhiping Xiao, Yuqing Wang, Chong Ruan, Ming Zhang, Wenfeng Liang, Wangding Zeng

I’m not finding it by author on the page you linked but I think it’s this reference by title:

> DeepSeek × PKU × UW — Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention

I did find it on this page:

https://2025.aclweb.org/program/main_papers/

pyuser583•9h ago
I'd say award for best title is a tie between: "Dehumanizing Machines: Mitigating Anthropomorphic Behaviors in Text Generation Systems"; "Finding Needles in Images: Can Multi-modal LLMs Locate Fine Details?"; and "Steering off Course: Reliability Challenges in Steering Language Models."
israrkhan•8h ago
Well deserved
noosphr•8h ago
Deep seek papers are a must to read for anyone who wants to understand how to make LLMs operate at hyper scale. All western labs hide their best results, or at most release summaries that are about as meaningful as the answers Cleo used to give on stack exchange: https://math.stackexchange.com/questions/562694/integral-int...

I have a suspicion with how quiet all the major players got after the two weeks after deepseek R1 was released that they were reading and implementing everything in the papers that came with it as fast as humanly possible.

Art9681•7h ago
None of the major players have ever been quiet. DeepSeek enjoyed about a week or two's worth of press before its spotlight was stolent from the next great model. It never held the top spot, ever, mind you. So I don't understand why you think major players had to say anything about it, when the model was neither first, second or third in real world capability, and why they would have to say anything about it when DeepSeek service processes maybe an 1/8 of what OpenAI, Google or Claude in any given span of time.

I applaud their open efforts. But being "altruistic" and being best are two different things.

sothatsit•5h ago
DeepSeek's contributions to training efficiency improvements were as, if not more, important than the models themselves. A lot of the worry people had about DeepSeek was related to people questioning the moat of the big AI players, since DeepSeek was able to train a competitive model with so much less compute.

Their innovations in training efficiency were almost guaranteed to have been heavily considered by the big AI labs. For example, Dario Amodei talks about the efficiency improvements being the real important contribution of DeepSeek V3 here: https://www.darioamodei.com/post/on-deepseek-and-export-cont...

> DeepSeek's team did this via some genuine and impressive innovations, mostly focused on engineering efficiency. There were particularly innovative improvements in the management of an aspect called the "Key-Value cache", and in enabling a method called "mixture of experts" to be pushed further than it had before.

benreesman•4h ago
MLA is just one example of a best-in-class technique from Hangzhou that's seen wide adoption in US prestige labs.

And the saltiness of US labs about DeepSeek is well-known. "O3, explain model distillation like I'm five."

No Sam, explain intellectual property rights to the judge in the NYT test case asshole.

nurettin•2h ago
I remember on february Deepseek's <think> caused a moderately sized market crash. They didn't just go silent, almost every vendor implemented their own version of thinking models while blaming Deepseek for stealing their tech/training on their models. It was rather pathetic to watch.
ninjin•8h ago
Link to the published paper rather than the preprint (update link?):

https://aclanthology.org/2025.acl-long.1126

visarga•4h ago
I am always skeptical of RNN approaches but this paper is just sparsifying the input, it is not compressing any size input to a fixed memory. I am hopeful maybe this is a big break. 11x inference speedup with no degradation from an algorithmic improvement. Is it really that good? almost too good to be true. Adoption in the next 6 months will tell us the truth.
tony_borlini•1h ago
DeepSeek and the Sparse Attention Revolution: How a Research Paper is Redefining AI Efficiency

https://deep.liveblog365.com/en/index-en.html?post=50

Terence Tao on the suspension of UCLA grants

https://mathstodon.xyz/@tao/114956840959338146
192•dargscisyhp•4h ago•138 comments

Cerebras Code

https://www.cerebras.ai/blog/introducing-cerebras-code
330•d3vr•12h ago•132 comments

Aerodynamic drag in small cyclist formations: shielding the protected rider [pdf]

http://www.urbanphysics.net/2025_Formation_Paper_Preprint_v1.pdf
10•PaulHoule•3d ago•2 comments

Hardening mode for the compiler

https://discourse.llvm.org/t/rfc-hardening-mode-for-the-compiler/87660
110•vitaut•8h ago•26 comments

Ladybird Browser July Update

https://ladybird.org/newsletter/2025-07-31/
192•net01•3h ago•51 comments

Coffeematic PC – A coffee maker computer that pumps hot coffee to the CPU

https://www.dougmacdowell.com/coffeematic-pc.html
201•dougdude3339•12h ago•52 comments

Why leather is best motorbike protection – whilst being dragged along concrete

https://www.youtube.com/watch?v=xwuRUcAGIEU
71•lifeisstillgood•2d ago•21 comments

JavaScript retro sound effects generator

https://github.grumdrig.com/jsfxr/
78•selvan•3d ago•17 comments

Weather Model based on ADS-B

https://obrhubr.org/adsb-weather-model
182•surprisetalk•2d ago•29 comments

At 17, Hannah Cairo solved a major math mystery

https://www.quantamagazine.org/at-17-hannah-cairo-solved-a-major-math-mystery-20250801/
344•baruchel•17h ago•148 comments

I couldn't submit a PR, so I got hired and fixed it myself

https://www.skeptrune.com/posts/doing-the-little-things/
266•skeptrune•17h ago•155 comments

Robert Wilson has died

https://www.theartnewspaper.com/2025/08/01/robert-wilson-playwright-director-artist-obituary
56•paulpauper•7h ago•13 comments

Ask HN: Who is hiring? (August 2025)

192•whoishiring•19h ago•221 comments

Ethersync: Peer-to-peer collaborative editing of local text files

https://github.com/ethersync/ethersync
127•blinry•3d ago•22 comments

Ferroelectric Helps Break Transistor Limits

https://spectrum.ieee.org/negative-capacitance-schottky-limit
8•pseudolus•3d ago•0 comments

The Rickover Corpus: A digital archive of Admiral Rickover's speeches and memos

https://rickovercorpus.org/
58•stmw•9h ago•11 comments

Microsoft is open sourcing Windows 11's UI framework

https://www.neowin.net/news/microsoft-is-taking-steps-to-open-sourcing-windows-11-user-interface-framework/
34•bundie•2h ago•29 comments

Yearly Organiser

https://neatnik.net/calendar/
40•anewhnaccount2•4d ago•13 comments

Does the Bitter Lesson Have Limits?

https://www.dbreunig.com/2025/08/01/does-the-bitter-lesson-have-limits.html
139•dbreunig•14h ago•66 comments

The First Widespread Cure for HIV Could Be in Children

https://www.wired.com/story/the-first-widespread-cure-for-hiv-could-be-in-children/
11•sohkamyung•1h ago•2 comments

Native Sparse Attention

https://aclanthology.org/2025.acl-long.1126/
120•CalmStorm•14h ago•16 comments

Anthropic revokes OpenAI's access to Claude

https://www.wired.com/story/anthropic-revokes-openais-access-to-claude/
234•minimaxir•12h ago•81 comments

Researchers map where solar energy delivers the biggest climate payoff

https://www.rutgers.edu/news/researchers-map-where-solar-energy-delivers-biggest-climate-payoff
93•rbanffy•14h ago•54 comments

Launch HN: Societies.io (YC W25) – AI simulations of your target audience

100•p-sharpe•22h ago•49 comments

Show HN: Draw a fish and watch it swim with the others

https://drawafish.com
856•hallak•4d ago•220 comments

Replacing tmux in my dev workflow

https://bower.sh/you-might-not-need-tmux
274•elashri•1d ago•306 comments

The tradeoff between human and AI context

https://softwaredoug.com/blog/2025/07/30/layers-of-ai-coding
22•softwaredoug•2d ago•0 comments

Ask HN: Who wants to be hired? (August 2025)

92•whoishiring•19h ago•203 comments

Our Farewell from Google Play

https://secuso.aifb.kit.edu/english/2809.php
283•shakna•1d ago•107 comments

Ergonomic keyboarding with the Svalboard: a half-year retrospective

https://twey.io/hci/svalboard/
101•Twey•17h ago•55 comments