frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

“Stop Designing Languages. Write Libraries Instead” (2016)

https://lbstanza.org/purpose_of_programming_languages.html
121•teleforce•2h ago•62 comments

A4 Paper Stories

https://susam.net/a4-paper-stories.html
85•blenderob•2h ago•37 comments

The Eric and Wendy Schmidt Observatory System

https://www.schmidtsciences.org/schmidt-observatory-system/
38•pppone•2h ago•28 comments

Show HN: KeelTest – AI-driven VS Code unit test generator with bug discovery

https://keelcode.dev/keeltest
13•bulba4aur•1h ago•4 comments

LaTeX Coffee Stains [pdf]

https://ctan.math.illinois.edu/graphics/pgf/contrib/coffeestains/coffeestains-en.pdf
4•zahrevsky•15m ago•0 comments

Formal methods only solve half my problems

https://brooker.co.za/blog/2022/06/02/formal.html
45•signa11•4d ago•14 comments

The first new compass since 1936

https://www.youtube.com/watch?v=eiDhbZ8-BZI
52•1970-01-01•5d ago•32 comments

Vector graphics on GPU

https://gasiulis.name/vector-graphics-on-gpu/
105•gsf_emergency_6•4d ago•18 comments

Stop Doom Scrolling, Start Doom Coding: Build via the terminal from your phone

https://github.com/rberg27/doom-coding
502•rbergamini27•19h ago•352 comments

Opus 4.5 is not the normal AI agent experience that I have had thus far

https://burkeholland.github.io/posts/opus-4-5-change-everything/
679•tbassetto•21h ago•961 comments

Everyone hates OneDrive, Microsofts cloud app that steals and deletes files

https://boingboing.net/2026/01/05/everyone-hates-onedrive-microsofts-cloud-app-that-steals-then-d...
25•mikecarlton•1h ago•10 comments

Optery (YC W22) Hiring a CISO and Web Scraping Engineers (Node) (US and Latam)

https://www.optery.com/careers/
1•beyondd•3h ago

Electronic nose for indoor mold detection and identification

https://advanced.onlinelibrary.wiley.com/doi/10.1002/adsr.202500124
155•PaulHoule•14h ago•87 comments

The creator of Claude Code's Claude setup

https://twitter.com/bcherny/status/2007179832300581177
490•KothuRoti•4d ago•319 comments

Show HN: SMTP Tunnel – A SOCKS5 proxy disguised as email traffic to bypass DPI

https://github.com/x011/smtp-tunnel-proxy
99•lobito25•14h ago•33 comments

A 30B Qwen model walks into a Raspberry Pi and runs in real time

https://byteshape.com/blogs/Qwen3-30B-A3B-Instruct-2507/
291•dataminer•18h ago•101 comments

Vietnam bans unskippable ads

https://saigoneer.com/vietnam-news/28652-vienam-bans-unskippable-ads,-requires-skip-button-to-app...
1468•hoherd•22h ago•747 comments

On the slow death of scaling

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5877662
96•sethbannon•11h ago•18 comments

I wanted a camera that doesn't exist, so I built it

https://medium.com/@cristi.baluta/i-wanted-a-camera-that-doesnt-exist-so-i-built-it-5f9864533eb7
421•cyrc•4d ago•131 comments

Show HN: Comet MCP – Give Claude Code a browser that can click

https://github.com/hanzili/comet-mcp
8•hanzili•3d ago•5 comments

Oral microbiome sequencing after taking probiotics

https://blog.booleanbiotech.com/oral-microbiome-biogaia
168•sethbannon•17h ago•71 comments

Investigating and fixing a nasty clone bug

https://kobzol.github.io/rust/2025/12/30/investigating-and-fixing-a-nasty-clone-bug.html
20•r4um•5d ago•0 comments

The ISEE Trajectories

https://www.drmindle.com/isee/
5•drmindle12358•2d ago•4 comments

We recreated Steve Jobs's 1975 Atari horoscope program

https://blog.adafruit.com/2026/01/06/we-recreated-steve-jobss-1975-atari-horoscope-program-and-yo...
86•ptorrone•14h ago•38 comments

What *is* code? (2015)

https://www.bloomberg.com/graphics/2015-paul-ford-what-is-code/
63•bblcla•5d ago•25 comments

CES 2026: Taking the Lids Off AMD's Venice and MI400 SoCs

https://chipsandcheese.com/p/ces-2026-taking-the-lids-off-amds
123•rbanffy•17h ago•70 comments

Calling All Hackers: How money works (2024)

https://phrack.org/issues/71/17
298•krrishd•18h ago•189 comments

Gnome dev gives fans of Linux's middle-click paste the middle finger

https://www.theregister.com/2026/01/07/gnome_middle_click_paste/
42•beardyw•1h ago•40 comments

Launch HN: Tamarind Bio (YC W24) – AI Inference Provider for Drug Discovery

74•denizkavi•21h ago•17 comments

Sergey Brin's Unretirement

https://www.inc.com/jessica-stillman/google-co-founder-sergey-brins-unretirement-is-a-lesson-for-...
266•iancmceachern•6d ago•334 comments
Open in hackernews

The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks (2018)

https://arxiv.org/abs/1803.03635
118•felineflock•4d ago

Comments

laughingcurve•1d ago
Article from 2018/19 and this hypothesis remains just that afaik with plenty of evidence going against it
yorwba•1d ago
What evidence against it do you have in mind? I think it's a result of little practical relevance without a way to identify winning tickets that doesn't require buying lots of tickets until you hit the jackpot (i.e. training a large, dense model to completion) but that doesn't make the observation itself incorrect.
kingstnap•1d ago
The observation itself is also partially incorrect. This is a video I watched a few months ago that went further into the whole how do you deal with subnetworks thing.

https://youtu.be/WW1ksk-O5c0?list=PLCq6a7gpFdPgldPSBWqd2THZh... (timestamped)

At the timestamp they discuss how actually the original ICLR results only worked on these extremely tiny models and larger ones didn't work. The adaptation you need to sort of fix it is to train densely first for a few epochs, only then you can start increasing sparsity.

paulsutter•23h ago
Watched the video - thanks

Ioannu is saying the paper's idea for training a dense network doesn't work in non-toy networks (the paper's method for selecting promising weights early doesn't improve the network)

BUT the term "lottery ticket" refers to the true observation that a small subset of weights drive functionality (see all pruning papers). It's great terminology because they truly are coincidences based on random numbers.

All that's been disproven is that paper's specific method to create a dense network based on this observation

swyx•1d ago
i intereviewed Jon (lead author on this paper) and yeah he pretty much disowns it now https://www.latent.space/p/mosaic-mpt-7b
gwern•1d ago
Could you explain why you think that? I'm looking at the lottery ticket section and it seems like he doesn't disown it; the reason he gives, via Abhinav, for not pursuing it at his commercial job is just that that kind of sparsity is not hardware friendly (except with Cerebras). "It doesn't provide a speedup for normal commercial workloads on normal commercial GPUs and that's why I'm not following it up at my commercial job and don't want to talk about it" seems pretty far from "disowning the lottery ticket hypothesis [as wrong or false]".
oofbey•1d ago
I think that was pretty clear even when this paper came out - even if you could find these sub networks they wouldn’t be faster on real hardware. Never thought much of this paper, but it sure did get a lot of people excited.
gwern•1d ago
(Cerebras is real hardware.)
oofbey•1d ago
It is real in that it exists. It is not real in the sense that almost nobody has access to them. Unless you work at one of the handful of organizations with their hardware, it’s not a practical reality.
aaronblohowiak•1d ago
how long will that be the case?
IshKebab•1d ago
At least for the foreseeable future (next 50 years say).
oofbey•1d ago
They have a strange business model. Their chips are massive. So they necessarily only sell them to large customers. Also because of the way they’re built (entire wafer is a single chip) no two chips will be the same. Normally imperfections in the manufacturing result in some parts of the wafer being rejected and other binned as fast or slow chips. If you use the whole wafer you get what you get. So it’s necessarily a strange platform to work with - every device is slightly different.
sailingparrot•1d ago
It was exciting because of what it means regarding how a model learns, regardless on whether or not its commercially applicable.
laughingcurve•1d ago
i saw how it nerdsniped an extremely capable faculty member
swyx•20h ago
he pretty much always says it offline haha but i maay have mixed it up with the subsequent convo we had at neurips https://www.latent.space/p/neurips-2023-startups
laughingcurve•1d ago
cool beans, thanks for this -- I think it's easier to hear it directly from the authors. I was hesitant to start researchposting and come off like a dick.

also; note to self: If I publish and disown my papers, shawn will interview me :)

observationist•1d ago
Neural networks are effectively gauge invariant, and you have a huge space of valid isomorphisms as far as possible "valid" layer orderings go, and if your network is overparameterized, the space of "good enough" approximations gets correspondingly larger. The good enough sets are a sort of fuzzy gauge quotient approximating some "ideal" function per layer or cluster or block (depending on your optimizer and architecture.)

https://arxiv.org/html/2506.13018v2 - Here's an interesting paper that can help inform how you might look at networks, especially in the context of lottery tickets, gauge quotients, permutations, and what gradient descent looks like in practice.

Kolmogorov Arnold Networks are better about exposing gauge symmetry and operating in that space, but aren't optimized for the hardware we have - mechinterp and other reasons might inspire new hardware, though. If you know what your layer function should look like, if it were ordered such that it resembled a smooth spline, you could initialize and freeze the weights of that layer, and force the rest of the network to learn within the context of your chosen ordering.

The number of "valid" configurations for a layer is large, especially if you have more neurons in the layer than you need, and the number of subsequent layer configurations is much larger than you'd think. The lottery ticket hypothesis is just circling that phenomenon without formalizing it - some surprisingly large percentage of possible configurations will approximate the function you want a network to learn. It doesn't necessarily gain you advantages in achieving the last 10% , and there could be counterproductive configurations that collapse before reaching an optimal configuration.

There are probably optimizer strategies that can exploit initializations of certain types, for different classes of activation functions, and achieve better performance for architectures - and all of those things are probably open to formalized methods based on existing number theory around gauge invariant systems and gauge quotients, with different layer configurations existing as points in gauge orbits in hyperdimensional spaces.

It'd be really cool if you could throw twice as many neurons as you need into a model, randomly initialize a bunch of times until you get a winning ticket, then distill the remainder down to your intended parameter count, and train from there as normal.

It's more complex with architectures like transformers, but you're not dealing with a combinatorial explosion with the LTH - more like a little combinatorial flash flood, and if you engineer around it, it can actually be exploited.

pizza•1d ago
Yes to this. Furthermore:

- you can solve neural networks in analytic form with a hodge star approach* [0]

- if you use a picture to set your initial weights for your nn, you can see visually how close or far your choice of optimizer is actually moving the weights - eg non-dualized optimizers look like they barely change things whereas dualized Muon changes the weights much more to the point you cannot recognize the originals [1]

*unfortunately, this is exponential in memory

[0] M. Pilanci — From Complexity to Clarity: Analytical Expressions of Deep Neural Network Weights via Clifford's Geometric Algebra and Convexity https://arxiv.org/abs/2309.16512

[1] https://docs.modula.systems/examples/weight-erasure/

eru•1d ago
Thanks for the explanations and the great links!
srean•1d ago
Wouldn't such local invariance tie in with flatness or shallowness of the minima ?

This would tie in with the observation that flat/shallow minimas are easier to find with stochastic gradient descent and such weights generalise better.

choult•1d ago
_Fewer_
eru•1d ago
https://en.wikipedia.org/wiki/Fewer_versus_less

Compare also http://fine.me.uk/Emonds/wholetext.xml

tomhow•1d ago
Indeed, the original title didn't make that mistake, so we've restored the original title as per the guidelines.
rob_c•1d ago
This is basically just a rehash of "trained" DNN are a function which is strongly dependent on the initialization parameters. (Easily provable)

It would be awesome to have a way of finding them in advance but this is also just a case of avoid pure DNNs due to their strong reliance on initialization parameters.

Looking at transformers by comparison you see a much much weaker dependence of the model on the input initial parameters. Does this mean the model is better or worse at learning or just more stable?

snaking0776•1d ago
This is an interesting insight I hadn’t thought much about before. Reminds me a bit of some of the mechanistic interpretability work that looked at branch specialization in CNNs and found that architectures which had built in branches tended to have those branches specialize in a way that was consistent across multiple training runs [1]. Maybe the multi-headed and branching nature of transformers adds and inductive bias that is useful for stable training over larger scales.

[1] https://distill.pub/2020/circuits/branch-specialization/

mceachen•1d ago
@dang please retitle with (2018)
sbinnee•1d ago
I was referring to this paper a lot when it was hyped, when people cared about architectural decisions of neural networks. It was also the year I started studying neural networks.

I think the idea still holds. Although the interest has been shifted towards test-time scaling and thinking, researcher still care about architectures like nemotron 3, recently published.

Can anyone give more updates on this direction of research, more recent papers?