frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Show HN: LocalGPT – A local-first AI assistant in Rust with persistent memory

https://github.com/localgpt-app/localgpt
93•yi_wang•3h ago•25 comments

Haskell for all: Beyond agentic coding

https://haskellforall.com/2026/02/beyond-agentic-coding
39•RebelPotato•2h ago•8 comments

SectorC: A C Compiler in 512 bytes (2023)

https://xorvoid.com/sectorc.html
241•valyala•11h ago•46 comments

Speed up responses with fast mode

https://code.claude.com/docs/en/fast-mode
154•surprisetalk•10h ago•150 comments

Software factories and the agentic moment

https://factory.strongdm.ai/
186•mellosouls•13h ago•335 comments

Brookhaven Lab's RHIC concludes 25-year run with final collisions

https://www.hpcwire.com/off-the-wire/brookhaven-labs-rhic-concludes-25-year-run-with-final-collis...
68•gnufx•9h ago•56 comments

Hoot: Scheme on WebAssembly

https://www.spritely.institute/hoot/
177•AlexeyBrin•16h ago•32 comments

Homeland Security Spying on Reddit Users

https://www.kenklippenstein.com/p/homeland-security-spies-on-reddit
10•duxup•54m ago•1 comments

Stories from 25 Years of Software Development

https://susam.net/twenty-five-years-of-computing.html
163•vinhnx•14h ago•16 comments

LLMs as the new high level language

https://federicopereiro.com/llm-high/
55•swah•4d ago•97 comments

Total Surface Area Required to Fuel the World with Solar (2009)

https://landartgenerator.org/blagi/archives/127
8•robtherobber•4d ago•2 comments

First Proof

https://arxiv.org/abs/2602.05192
129•samasblack•13h ago•76 comments

Vocal Guide – belt sing without killing yourself

https://jesperordrup.github.io/vocal-guide/
306•jesperordrup•21h ago•95 comments

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

https://github.com/Momciloo/fun-with-clip-path
74•momciloo•11h ago•16 comments

Al Lowe on model trains, funny deaths and working with Disney

https://spillhistorie.no/2026/02/06/interview-with-sierra-veteran-al-lowe/
98•thelok•13h ago•22 comments

FDA intends to take action against non-FDA-approved GLP-1 drugs

https://www.fda.gov/news-events/press-announcements/fda-intends-take-action-against-non-fda-appro...
104•randycupertino•6h ago•223 comments

Vouch

https://twitter.com/mitchellh/status/2020252149117313349
43•chwtutha•1h ago•7 comments

Show HN: A luma dependent chroma compression algorithm (image compression)

https://www.bitsnbites.eu/a-spatial-domain-variable-block-size-luma-dependent-chroma-compression-...
37•mbitsnbites•3d ago•4 comments

Show HN: Axiomeer – An open marketplace for AI agents

https://github.com/ujjwalredd/Axiomeer
11•ujjwalreddyks•5d ago•2 comments

Start all of your commands with a comma (2009)

https://rhodesmill.org/brandon/2009/commands-with-comma/
571•theblazehen•3d ago•206 comments

The AI boom is causing shortages everywhere else

https://www.washingtonpost.com/technology/2026/02/07/ai-spending-economy-shortages/
292•1vuio0pswjnm7•17h ago•471 comments

Microsoft account bugs locked me out of Notepad – Are thin clients ruining PCs?

https://www.windowscentral.com/microsoft/windows-11/windows-locked-me-out-of-notepad-is-the-thin-...
133•josephcsible•9h ago•161 comments

I write games in C (yes, C) (2016)

https://jonathanwhiting.com/writing/blog/games_in_c/
184•valyala•11h ago•166 comments

Learning from context is harder than we thought

https://hy.tencent.com/research/100025?langVersion=en
229•limoce•4d ago•125 comments

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

https://openciv3.org/
900•klaussilveira•1d ago•276 comments

Selection rather than prediction

https://voratiq.com/blog/selection-rather-than-prediction/
30•languid-photic•4d ago•12 comments

Where did all the starships go?

https://www.datawrapper.de/blog/science-fiction-decline
146•speckx•4d ago•228 comments

The F Word

http://muratbuffalo.blogspot.com/2026/02/friction.html
113•zdw•3d ago•56 comments

Unseen Footage of Atari Battlezone Arcade Cabinet Production

https://arcadeblogger.com/2026/02/02/unseen-footage-of-atari-battlezone-cabinet-production/
145•videotopia•4d ago•48 comments

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

https://github.com/valdanylchuk/breezydemo
303•isitcontent•1d ago•39 comments
Open in hackernews

Native Sparse Attention

https://aclanthology.org/2025.acl-long.1126/
139•CalmStorm•6mo ago
Was submitted as "DeepSeek won the best paper award at ACL 2025"

Here is the awards page: https://cspaper.org/topic/116/record-breaking-acl-2025-crown...

Comments

CalmStorm•6mo ago
For the first time, it introduced native sparse attention into the full training process, achieving up to 11× inference speedup while maintaining model performance.
sabaimran•6mo ago
> Despite being sparse, NSA surpasses Full Attention baseline on average across general benchmarks, long-context tasks, and reasoning evaluation.

Isn't it very notable that the latency improvement didn't have a performance loss? I'm not super familiar with all the technical aspects, but that seems like it should be one of the main focuses of the paper.

ethan_smith•6mo ago
The performance maintenance (or even improvement) isn't surprising - sparse attention can reduce noise by focusing only on relevant tokens. Traditional full attention dilutes focus by attending to everything equally, while NSA's pruning approach mimics how humans selectively process information.
laughingcurve•6mo ago
Yes that’s what makes it so interesting and novel you nailed it
gnabgib•6mo ago
Title: Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention

The awards page for ACL seems to disagree with this editorialized title: https://2025.aclweb.org/program/awards/

fourdnet•6mo ago
The ACL webpage has not been updated yet. Here are the announcement slides: https://cspaper.org/topic/116/record-breaking-acl-2025-crown...
aspenmayer•6mo ago
The page that the person you’re replying to does have this so it may not be updated, or they were looking in the wrong place originally, or both:

> Industry Track Awards

> Best Paper

> Speed Without Sacrifice: Fine-Tuning Language Models with Medusa and Knowledge Distillation in Travel Applications

> Daniel Zagyva, Emmanouil Stergiadis, Laurens van der Maas, Aleksandra Dokic, Eran Fainman, Ilya Gusev, Moran Beladev

Per TFA, the paper we’re looking for is this one:

> Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention

> Jingyang Yuan, Huazuo Gao, Damai Dai, Junyu Luo, Liang Zhao, Zhengyan Zhang, Zhenda Xie, Y. X. Wei, Lean Wang, Zhiping Xiao, Yuqing Wang, Chong Ruan, Ming Zhang, Wenfeng Liang, Wangding Zeng

I’m not finding it by author on the page you linked but I think it’s this reference by title:

> DeepSeek × PKU × UW — Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention

I did find it on this page:

https://2025.aclweb.org/program/main_papers/

pyuser583•6mo ago
I'd say award for best title is a tie between: "Dehumanizing Machines: Mitigating Anthropomorphic Behaviors in Text Generation Systems"; "Finding Needles in Images: Can Multi-modal LLMs Locate Fine Details?"; and "Steering off Course: Reliability Challenges in Steering Language Models."
israrkhan•6mo ago
Well deserved
laughingcurve•6mo ago
Yea I agree. It’s sad to find so much of the comments are focused on reinventing reality and jingoism instead of scientific discussion on the merits and technicals. I’ll return tomorrow and hope for better comments.
noosphr•6mo ago
Deep seek papers are a must to read for anyone who wants to understand how to make LLMs operate at hyper scale. All western labs hide their best results, or at most release summaries that are about as meaningful as the answers Cleo used to give on stack exchange: https://math.stackexchange.com/questions/562694/integral-int...

I have a suspicion with how quiet all the major players got after the two weeks after deepseek R1 was released that they were reading and implementing everything in the papers that came with it as fast as humanly possible.

Art9681•6mo ago
None of the major players have ever been quiet. DeepSeek enjoyed about a week or two's worth of press before its spotlight was stolent from the next great model. It never held the top spot, ever, mind you. So I don't understand why you think major players had to say anything about it, when the model was neither first, second or third in real world capability, and why they would have to say anything about it when DeepSeek service processes maybe an 1/8 of what OpenAI, Google or Claude in any given span of time.

I applaud their open efforts. But being "altruistic" and being best are two different things.

sothatsit•6mo ago
DeepSeek's contributions to training efficiency improvements were as, if not more, important than the models themselves. A lot of the worry people had about DeepSeek was related to people questioning the moat of the big AI players, since DeepSeek was able to train a competitive model with so much less compute.

Their innovations in training efficiency were almost guaranteed to have been heavily considered by the big AI labs. For example, Dario Amodei talks about the efficiency improvements being the real important contribution of DeepSeek V3 here: https://www.darioamodei.com/post/on-deepseek-and-export-cont...

> DeepSeek's team did this via some genuine and impressive innovations, mostly focused on engineering efficiency. There were particularly innovative improvements in the management of an aspect called the "Key-Value cache", and in enabling a method called "mixture of experts" to be pushed further than it had before.

laughingcurve•6mo ago
Almost all of High Flyers achievements have more to do with scaling the process but when scaling is all you need, it’s darn effective
benreesman•6mo ago
MLA is just one example of a best-in-class technique from Hangzhou that's seen wide adoption in US prestige labs.

And the saltiness of US labs about DeepSeek is well-known. "O3, explain model distillation like I'm five."

No Sam, explain intellectual property rights to the judge in the NYT test case asshole.

laughingcurve•6mo ago
… wait did you just seriously tell SamA that he’s an asshole because of copyright issues… while praising Chinese labs who couldn’t give a rat fuck and won’t follow the same laws? Or pay creators? Physician, heal thyself
benreesman•6mo ago
Sam's an asshole for a lot of reasons, a ridiculous commons grab of intellectual property draped in threadbare rhetoric about human welfare (get those developing nation eyeballs SCANNED people!) being just one of them.

Watching the Chinese labs kick the shit out of better funded US enclaves of TESCREAL psychopathy in the public fucking domain is gravy.

I don't care that their internal calculus or that of the PRC is to Cloud Strife Limit Break a bunch of "shareholder value" in the form of a bloated NVIDIA cap feeding frenzy by bloated "public benefit corporations" with a bunch of creepy ties to Thiel et al: they're publishing papers, code and weights. So they're hoovering up of the commons has something of value going back into the commons.

So yeah, fuck Sam and its going to be fun watching OpenAI and Anthropic pivot ever more towards trying to outlaw competition than they already have. Amodei already sounds like Donald Rumsfeld on Taiwan hawkishness, this is not the positioning of someone who loves their product roadmap.

It turns out that a zillion ScaleAI and SurgeAI turks don't have economics any better than paying NVIDIA to run 85% net earnings for CapEx that's obsolete by the time its racked and powered.

laughingcurve•6mo ago
... You did not speak to the key point at all and went on some massive rambling incoherent political commentary. I feel this comment is unworthy of the thread.

Native Sparse Attention matters. Your commentary is beneath this paper.

laughingcurve•6mo ago
Genuinely many times it seems most people need to find reasons to assume the best about DeepSeek and China in order to confirm their prior bias that “America bad” and “Capital is evil”. The reality is grey and fuzzy, with neither side landing on truth yet
cma•6mo ago
How would people use deepseek to think "Capital is evil?" It was from a private hedge fund named "High Flyer," not a state university project or something.
laughingcurve•6mo ago
Yes, exactly. How the heck? It makes no sense to me either, but you can certainly find plenty of laymen/not-in-the-know folks making those kinds of comments, often in non-technical spaces. Often the worst parts of the internet where discourse is non-existent. Human psychology allows for us to hold many contradictory positions all at once. Ideologies are the lens through which we view the world and it distorts our perception.
cma•6mo ago
Usually what I see is not that, but that Deepseek stole from American capital by training on the O1 release to acheive chain of thought, but there is a contradiction because o1 at the time didn't show its real chain of thought to train on.
laughingcurve•6mo ago
It crashed the market because retail investors and perhaps non-retail as well had a great deal in overconfidence with the ability of the USA to maintain a lead thanks to the chip gap. High Flyer's innovations allowed them to scale and show that is not the case. This major event then likely spurred on many others. It was a mini 'sputnik moment'
nurettin•6mo ago
I remember on february Deepseek's <think> caused a moderately sized market crash. They didn't just go silent, almost every vendor implemented their own version of thinking models while blaming Deepseek for stealing their tech/training on their models. It was rather pathetic to watch.
laughingcurve•6mo ago
OAI and others were already on their way there or released the models. How did you manage to convince yourself that High Flyer did it first ? And that everyone else copied from them post-hoc? You’ve created a new chain of causality that simply does not match neutral reality
nurettin•6mo ago
Yeah I confess I rewrote history and crashed the stock market. Then ran out of juice just as I was about to kill Hitler.
laughingcurve•6mo ago
Do not try to signal intelligence by being sardonic or intentionally being obtuse. Actively avoiding the point someone is making rather than confronting it head on is beneath you.

ChatGPT o1 was made generally available in December 2024 DeepSeek r1 open weights were released in January 2025

nurettin•6mo ago
My response wasn't intended to signal my intelligence as much as to show the overall lack of serious thought in your reply since you seem to be ignoring the fact that there was a 15%+ overall drop in the stock market, much higher in tech sector. But let's spell it out for you:

o-1 was mimicking the newest "explain your solution step by step" prompts which were proven to be more effective at the time.

ds-v1 came up with an actual chain of thought, imitating meandering and self doubt which sometimes went on for a while creating entertaining loops and introducing a new class of halting problem and this became the de-facto standard. They also revolutionized the industry by programming cards directly via PTX.

Then all of huggingface implemented the paper and we got q4 versions that "thought".

Hope that jolted your memory without killing Hitler.

ninjin•6mo ago
Link to the published paper rather than the preprint (update link?):

https://aclanthology.org/2025.acl-long.1126

visarga•6mo ago
I am always skeptical of RNN approaches but this paper is just sparsifying the input, it is not compressing any size input to a fixed memory. I am hopeful maybe this is a big break. 11x inference speedup with no degradation from an algorithmic improvement. Is it really that good? almost too good to be true. Adoption in the next 6 months will tell us the truth.
tony_borlini•6mo ago
DeepSeek and the Sparse Attention Revolution: How a Research Paper is Redefining AI Efficiency

https://deep.liveblog365.com/en/index-en.html?post=50