frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

The lottery ticket hypothesis: why neural networks work

https://nearlyright.com/how-ai-researchers-accidentally-discovered-that-everything-they-thought-about-learning-was-wrong/
30•076ae80a-3c97-4•4h ago

Comments

derbOac•1h ago
In some sense, isn't this overfitting, but "hidden" by the typical feature sets that are observed?

Time and time again, some kind of process will identify some simple but absurd adversarial "trick stimulus" that throws off the deep network solution. These seem like blatant cases of over fitting that go unrecognized or unchallenged in typical life because the sampling space of stimuli doesn't usually include the adversarial trick stimuli.

I guess I've not really thought of the bias-variance tradeoff necessarily as being about number of parameters, but rather, the flexibility of the model relative to the learnable information in the sample space. There's some formulations (e.g., Shtarkov-Rissanen normalized maximum likelihood) that treat overfitting in terms of the ability to reproduce data that is wildly outside a typical training set. This is related to, but not the same as, the number of parameters per se.

api•1h ago
This sounds like it's proposing that what's happening during large model training is a little bit akin to genetic algorithms: many small networks emerge and there is a selection process, some get fixed, and the rest fade and are then repurposed/drifted into other roles, repeat.
xg15•1h ago
Wouldn't this imply that most of the inference time storage and compute might be unnecessary?

If the hypothesis is true, it makes sense to scale up models as much as possible during training - but once the model is sufficiently trained for the task, wouldn't 99% of the weights be literal "dead weight" - because they represent the "failed lottery tickets", i.e. the subnetworks that did not have the right starting values to learn anything useful? So why do we keep them around and waste enormous amounts of storage and compute on them?

tough•1h ago
someone on twitter was exploring and linked to some related papers where you can for example trim experts on a MoE model if you're 100% sure they're never active for your specific task

what the bigger wide net bigs you is generalization

markeroon•48m ago
Look into pruning
paulsutter•32m ago
That’s exactly how it works, read up on pruning. You can ignore most of the weights and still get great results. One issue is that sparse matrices are vastly less efficient to multiply.

But yes you’ve got it

FuckButtons•8m ago
For any particular single pattern learned 99% of the weights are dead weight. But it’s not the same 99% for each lesson learned.
highfrequency•38m ago
Enjoyed the article. To play devil’s advocate, an entirely different explanation for why huge models work: the primary insight was framing the problem as next-word prediction. This immediately creates an internet-scale dataset with trillions of labeled examples, which also has rich enough structure to make huge expressiveness useful. LLMs don’t disprove bias-variance tradeoff; we just found a lot more data and the GPUs to learn from it.

It’s not like people didn’t try bigger models in the past, but either the data was too small or the structure too simple to show improvements with more model complexity. (Or they simply trained the biggest model they could fit on the GPUs of the time.)

pixl97•4m ago
I think a lot of it is the massive amount of compute we've got in the last decade. While inference may have been possible on the hardware the training would have taken lifetimes.
belter•37m ago
This article is like a quick street rap. Lots of rhythm, not much thesis. Big on tone, light on analysis...Or no actual thesis other than a feelgood factor. I want these 5 min back.
abhinuvpitale•34m ago
Interesting article, is it concluding that different small networks are formed for different types of problems that we are trying to solve with the larger network?

How is this different from overfitting though? (PS: Overfitting isn't that bad if you think about it, as long as the test dataset or inference time model is trying to solve problems in the supposedly large enough training dataset)

deepfriedchokes•31m ago
Rather than reframing intelligence itself, wouldn’t Occam’s Razor suggest instead that this isn’t intelligence at all?
gotoeleven•13m ago
This article gives a really bad/wrong explanation of the lottery ticket hypothesis. Here's the original paper

https://arxiv.org/abs/1803.03635

ghssds•9m ago
Can someone explain how AI research can have a 300 years history?

Show HN: Xbow raised $117M to build AI hackers, I open-sourced it for free

https://github.com/usestrix/strix
22•ahmedallam2•38m ago•4 comments

Show HN: Whispering – Open-source, local-first dictation you can trust

https://github.com/epicenter-so/epicenter/tree/main/apps/whispering
162•braden-w•4h ago•37 comments

Left to Right Programming

https://graic.net/p/left-to-right-programming
118•graic•4h ago•102 comments

Show HN: I built an app to block Shorts and Reels

https://scrollguard.app/
414•adrianhacar•2d ago•160 comments

FFmpeg Assembly Language Lessons

https://github.com/FFmpeg/asm-lessons
268•flykespice•7h ago•81 comments

Counter-Strike: A billion-dollar game built in a dorm room

https://www.nytimes.com/2025/08/18/arts/counter-strike-half-life-minh-le.html
114•asnyder•6h ago•109 comments

T-Mobile claimed selling location data without consent is legal–judges disagree

https://arstechnica.com/tech-policy/2025/08/t-mobile-claimed-selling-location-data-without-consent-is-legal-judges-disagree/
126•Bender•1h ago•26 comments

A minimal tensor processing unit (TPU), inspired by Google's TPU

https://github.com/tiny-tpu-v2/tiny-tpu
15•admp•47m ago•2 comments

GenAI FOMO has spurred businesses to light nearly $40B on fire

https://www.theregister.com/2025/08/18/generative_ai_zero_return_95_percent/
53•rntn•1h ago•17 comments

Anna's Archive: An Update from the Team

https://annas-archive.org/blog/an-update-from-the-team.html
669•jerheinze•4h ago•311 comments

The Weight of a Cell

https://www.asimov.press/p/cell-weight
69•arbesman•5h ago•21 comments

Web apps in a single, portable, self-updating, vanilla HTML file

https://hyperclay.com/
564•pil0u•14h ago•199 comments

Launch HN: Reality Defender (YC W22) – API for Deepfake and GenAI Detection

https://www.realitydefender.com/platform/api
56•bpcrd•6h ago•27 comments

The Cutaway Illustrations of Fred Freeman

https://5wgraphicsblog.com/2016/10/24/the-cutaway-illustrations-of-fred-freeman/
54•Michelangelo11•2d ago•6 comments

Sikkim and the Himalayan Chess Game

https://www.historytoday.com/archive/feature/sikkim-and-himalayan-chess-game
7•pepys•3d ago•0 comments

TREAD: Token Routing for Efficient Architecture-Agnostic Diffusion Training

https://arxiv.org/abs/2501.04765
31•fzliu•3h ago•5 comments

How much do electric car batteries degrade?

https://www.sustainabilitybynumbers.com/p/electric-car-battery-degradation
74•xnx•3h ago•93 comments

Typechecker Zoo

https://sdiehl.github.io/typechecker-zoo/
116•todsacerdoti•3d ago•19 comments

Who Invented Backpropagation?

https://people.idsia.ch/~juergen/who-invented-backpropagation.html
147•nothrowaways•5h ago•79 comments

My Retro TVs

https://www.myretrotvs.com/
110•the-mitr•4h ago•19 comments

Show HN: I built a toy TPU that can do inference and training on the XOR problem

https://www.tinytpu.com
9•evxxan•1h ago•1 comments

Mindless Machines, Mindless Myths

https://lareviewofbooks.org/article/mindless-machines-mindless-myths/
21•lermontov•13h ago•0 comments

Electromechanical reshaping, an alternative to laser eye surgery

https://medicalxpress.com/news/2025-08-alternative-lasik-lasers.html
205•Gaishan•11h ago•83 comments

Countrywide natural experiment links built environment to physical activity

https://www.nature.com/articles/s41586-025-09321-3
44•Anon84•2d ago•31 comments

Finding a Successor to the FHS

https://lwn.net/SubscriberLink/1032947/67e23ce1a3f9f129/
27•firexcy•14h ago•16 comments

Show HN: We started building an AI dev tool but it turned into a Sims-style game

https://www.youtube.com/watch?v=sRPnX_f2V_c
65•maxraven•2h ago•51 comments

Macintosh Drawing Software Compared

https://blog.gingerbeardman.com/2021/04/24/macintosh-drawing-software-compared/
14•rcarmo•4h ago•1 comments

The lottery ticket hypothesis: why neural networks work

https://nearlyright.com/how-ai-researchers-accidentally-discovered-that-everything-they-thought-about-learning-was-wrong/
30•076ae80a-3c97-4•4h ago•15 comments

Image Fulgurator (2011)

https://juliusvonbismarck.com/bank/index.php/projects/image-fulgurator/2/
42•Liftyee•2d ago•3 comments

SystemD Service Hardening

https://roguesecurity.dev/blog/systemd-hardening
231•todsacerdoti•16h ago•85 comments