frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Personal Encyclopedias

https://whoami.wiki/blog/personal-encyclopedias
40•jrmyphlmn•12h ago•5 comments

Running Tesla Model 3's computer on my desk using parts from crashed cars

https://bugs.xdavidhu.me/tesla/2026/03/23/running-tesla-model-3s-computer-on-my-desk-using-parts-...
588•driesdep•11h ago•177 comments

ARC-AGI-3

https://arcprize.org/arc-agi/3
375•lairv•14h ago•253 comments

Earthquake scientists reveal how overplowing weakens soil at experimental farm

https://www.washington.edu/news/2026/03/19/earthquake-scientists-reveal-how-overplowing-weakens-s...
155•Brajeshwar•18h ago•62 comments

The truth that haunts the Ramones: 'They sold more T-shirts than records'

https://english.elpais.com/culture/2026-03-17/the-uncomfortable-truth-that-will-always-haunt-the-...
98•c420•4d ago•37 comments

Two studies in compiler optimisations

https://www.hmpcabral.com/2026/03/20/two-studies-in-compiler-optimisations/
70•hmpc•3d ago•5 comments

90% of Claude-linked output going to GitHub repos w <2 stars

https://www.claudescode.dev/?window=since_launch
268•louiereederson•14h ago•156 comments

My DIY FPGA board can run Quake II

https://blog.mikhe.ch/quake2-on-fpga/part4.html
136•sznio•3d ago•44 comments

Ashby (YC W19) Is Hiring Engineers Who Make Product Decisions

https://www.ashbyhq.com/careers?ashby_jid=c3c7125d-7883-4dff-a2bf-f5a55de4a364&utm_source=hn
1•abhikp•1h ago

The EU still wants to scan your private messages and photos

https://fightchatcontrol.eu/?foo=bar
1065•MrBruh•12h ago•277 comments

Supreme Court Sides with Cox in Copyright Fight over Pirated Music

https://www.nytimes.com/2026/03/25/us/politics/supreme-court-cox-music-copyright.html
335•oj2828•17h ago•259 comments

False claims in a widely-cited paper

https://statmodeling.stat.columbia.edu/2026/03/24/false-claims-in-a-published-no-corrections-no-c...
257•qsi•7h ago•100 comments

More precise elevation data for GraphHopper routing engine

https://www.graphhopper.com/blog/2026/03/23/more-precise-elevation-data-for-graphhopper/
26•karussell•2d ago•0 comments

Apple randomly closes bug reports unless you "verify" the bug remains unfixed

https://lapcatsoftware.com/articles/2026/3/11.html
382•zdw•13h ago•216 comments

Quantization from the Ground Up

https://ngrok.com/blog/quantization
244•samwho•16h ago•46 comments

What Came After the 486?

https://dfarq.homeip.net/what-came-after-486/
11•jnord•2d ago•6 comments

Shell Tricks That Make Life Easier (and Save Your Sanity)

https://blog.hofstede.it/shell-tricks-that-actually-make-life-easier-and-save-your-sanity/
40•zdw•7h ago•3 comments

Power consumption of Game Boy flash cartridges (2021)

https://gekkio.fi/blog/2021/power-consumption-of-game-boy-flash-cartridges/
30•JNRowe•2d ago•1 comments

Show HN: A plain-text cognitive architecture for Claude Code

https://lab.puga.com.br/cog/
88•marciopuga•8h ago•25 comments

Do Architects Still Need to Draw? (2020)

https://www.lifeofanarchitect.com/do-architects-still-need-to-draw/
15•hbarka•4d ago•12 comments

Data is everywhere. The government is buying it without a warrant

https://www.npr.org/2026/03/25/nx-s1-5752369/ice-surveillance-data-brokers-congress-anthropic
28•nuke-web3•2h ago•3 comments

My astrophotography in the movie Project Hail Mary

https://rpastro.square.site/s/stories/phm
849•wallflower•3d ago•197 comments

Jury finds Meta liable in case over child sexual exploitation on its platforms

https://www.cnn.com/2026/03/24/tech/meta-new-mexico-trial-jury-deliberation
377•billfor•1d ago•471 comments

Show HN: Optio – Orchestrate AI coding agents in K8s to go from ticket to PR

https://github.com/jonwiggins/optio
48•jawiggins•15h ago•28 comments

"Disregard That" Attacks

https://calpaterson.com/disregard.html
73•leontrolski•9h ago•53 comments

Thoughts on slowing the fuck down

https://mariozechner.at/posts/2026-03-25-thoughts-on-slowing-the-fuck-down/
822•jdkoeck•18h ago•378 comments

Woman who never stopped updating her lost dog's chip reunites with him after 11y

https://www.cbc.ca/radio/asithappens/11-year-dog-reunion-9.7140780
170•gnabgib•8h ago•98 comments

FreeCAD v1.1

https://blog.freecad.org/2026/03/25/freecad-version-1-1-released/
252•sho_hn•13h ago•77 comments

Rendering complex scripts in terminal and OSC 66

https://thottingal.in/blog/2026/03/22/complex-scripts-in-terminal/
30•sthottingal•3d ago•8 comments

Show HN: Robust LLM Extractor for Websites in TypeScript

https://github.com/lightfeed/extractor
33•andrew_zhong•4h ago•20 comments
Open in hackernews

Transformer neural net learns to run Conway's Game of Life just from examples

https://sidsite.com/posts/life-transformer/
69•montebicyclelo•10mo ago

Comments

bonzini•10mo ago
Do I understand correctly that it's brute forcing a small grid rather than learning the algorithm?
montebicyclelo•10mo ago
> it's brute forcing a small grid

If by small grid you are referring to the attention matrix plot shown, then that is not a correct interpretation. That diagonal-like pattern it learns, is 3x3 convolution, so it can compare the neighbours of a given cell.

Edit: and note that every grid it is trained on / runs inference on is randomly generated and completely unique, so it cannot just memorise examples

bonzini•10mo ago
My interpretation is that while it did learn the exact computation and not just a statistical approximation, it's still limited to a grid of a given size. In that sense the attention matrix is brute forced and the network did not learn a generalization. The article itself says "The largest grid size we successfully trained was 16x16".
yorwba•10mo ago
They're using learned positional embeddings for each grid cell, so there's no straightforward way to extend a model trained on a small grid to a larger grid. If you go from large to small, I think it would do better than chance, but get the periodic boundary condition wrong, because the period changes with the grid size.

Using 2D RoPE instead would in principle allow scaling up as well, and maybe even period detection if you train it across a range of grids, but would eventually hit the same issues that plague long-context scaling in LLMs.

bernb•10mo ago
Yes, it does not understand (or has learned) the rules of the game then. For that being the case, it should be able to apply the rules correctly in a slightly different context.

Would it be possible to train an LLM on the rules how we would teach them to a human?

constantcrying•10mo ago
To be honest an unsurprising result.

But I think the paper fails to answer the most important question. It alleges that this isn't a statistical model: "it is not a statistical model that predicts the most likely next state based on all the examples it has been trained on.

We observe that it learns to use its attention mechanism to compute 3x3 convolutions — 3x3 convolutions are a common way to implement the Game of Life, since it can be used to count the neighbours of a cell, which is used to decide whether the cell lives or dies."

But it is never actually shown that this is the case. It later on isn't even alleged that this is true, rather the metric they use is that it gives the correct answers often enough, as a test for convergence and not that the net has converged to values which give the correct algorithm.

But there is no guarantee that it actually has learned the game. There are still learned parameters and the paper doesn't investigate if these parameters actually have converged to something where the Net is actually just a computation of the algorithm. The most interesting question is left unanswered.

Y_Y•10mo ago
Reminds me of this great story about a programmer-turned-businessman who tried to learn a game from examples and ended up with an almost-correct brute force solution:

https://www.borrett.id.au/computing/petals-bg.htm

montebicyclelo•10mo ago
The diagonal-looking attention matrix shown in the post is mathematically equivalent to 3 by 3 convolution. The model learns how to do that via its attention mechanism - it's not obvious that it would be able to do that via attention.

(This can be shown by comparing that attention matrix to a "manually computed Neighbour Attention matrix", which is known to be equivalent to 3 by 3 conv.)

constantcrying•10mo ago
Yes, I also quoted that part from the article. This does not address that the attention Matrix does not represent all learned parameters. Even supposing that the form of the attention matrix guarantees the correct functioning of the algorithm why was that not used as the metric to decide convergence?

"We detected that the model had converged by looking for 1024 training batches with perfect predictions, and that it could perfectly run 100 Life games for 100 steps." This would be superfluous (and even a pretty bizarre methodology) if the shape of the attention matrix was proof that the Network performed the actual game of life algorithm.

Just to be clear, I am not saying that the NN isn't converging to performing some computation that would also be seen in other algorithms. I am saying that the paper does not investigate whether the resulting NN actually performs the game of life algorithm. The convolution part is certainly evidence, but I think it would have been worthwhile to look at the actual resulting Net and figure out if the trained weights together actually formed an algorithm. This is also the only way to determine the truth of the initial claim, that this isn't just a statistical model, but rather an actual algorithm.

montebicyclelo•10mo ago
> the paper does not investigate whether the resulting NN actually performs the game of life algorithm

How could it not be computing the game of life algorithm? Given that it gets 100% accuracy over multiple steps on a bunch random game boards it's never seen before.

And then based on the structure of the net, and by examining the attention layers and finding that it's doing 3 by 3 average pooling, we can see that the attention layer produces a set of tokens, where each token contains the information of the number of neighbours it had, and its previous state. This then goes through a classifier layer, which decides it's next state, given that information.

Further evidence for that: it was possible to use linear probes to confirm that the tokens that had been through the attention layer contained the information about the number of neighbours and the previous state.

From all of this, it's clear that the model is running the Game of Life properly.

constantcrying•10mo ago
Do you not understand the difference between empirical evidence and mathematical proof? Surely every person talking about NN research should be aware of that distinction.

> How could it not be computing the game of life algorithm? Given that it gets 100% accuracy over multiple steps on a bunch random game boards it's never seen before.

This is such an insane statement.

montebicyclelo•10mo ago
> This is such an insane statement.

In what way? Maybe you mean something different when you say computing the game of Life algorithm.

gwern•10mo ago
It would be more convincing if they did an exhaustive enumeration and verified that for every possible 3x3 Life the learned NN was correct. How do I know looking at a speckled screenshot that it is exactly correct and there's not a little floating point error somewhere or something like that which results in 1 edge-case being slightly off? If the only testing is '100 Life games for 100 steps', that isn't water-tight. (While if you do exhaustive enumeration, well, it has to be correct, because the NN is deterministic and fixed and there's no way for it to go wrong then.)
constantcrying•10mo ago
I think it would have also been very interesting to manually construct a NN, which represented the rules exactly. Maybe there is some nice mathematical way to describe them or some constraints need to be fulfilled.

Then afterwards you can check the neutral network against the exact algorithms.

montebicyclelo•10mo ago
Edit: increased the validation to 10,000 life grids for 100 steps, (taking 16 minutes to check), which is hopefully somewhat more convincing. That's 1,000,000 life steps computed without errors in total. Plus 32,000 steps computed without error during training.

When the attention grid is manually computed (to be equivalent to 3 by 3 conv), the model can be trained to be 100% perfect, verified by checking all 3 by 3 grid states. (And this manually computed attention matrix means that once the tokens reach the classifier layer, each token contains only the information of the relevant 3 by 3 grid, and the whole thing is deterministic as you say.)

However, when the model is computing the attention grid itself, just checking all 3 by 3 sub-grid states crop up is not enough, because the position of the sub-grids can impact the attention matrix, and also the state of other cells can impact the attention matrix. So as shown in the post, it does approximate 3 by 3 conv, but if it doesn't get the approximation quite right, there could be errors. But I would say that it's still computing the Game of Life algorithm in an interpretable way, it's just that maybe it has struggled to create a perfect 3 by 3 convolution via attention in that particular case. (To exhaustively check this, would require checking all 2 * (16x16) grids.)

eapriv•10mo ago
Great, we can spend crazy amount of computational resources and hand-holding in order to (maybe) reproduce three lines of code.
zelphirkalt•10mo ago
Exactly my thoughts. This is not useful at all. We already know how to write exact and correct code to implement that. This is no task that we should throw ANNs at.
gessha•10mo ago
Basic research has non-obvious utility and it deserves its own spotlight.

It’s similar to comparing hardware radio and software-defined radio: Yes, we already know how to build a radio with hardware but a software-defined one offers greater flexibility.

ninetyninenine•10mo ago
The significance of this is that we can fully understand this problem because it’s only 3 lines of code.

Like for learning the English language we don’t fully understand the way LLMs work. We can’t fully characterize it. So we have debates on whether the LLM actually understands English or understands what it’s talking about. We simply don’t know.

The results of this show that the transformer understands the game of life. Or whatever the transformer does with the rules of the game of life it’s safe to say that it fits a definition of understanding as mankind knows it.

Like much of machine learning where we use the abstraction of curve fitting to understand higher dimensional learning we can do the same extrapolation here.

If the transformer understands the game of life then that understanding must translate over to the LLM. The LLM understands English and understands the contents of what it is talking about.

There was a clear gradient of understanding before understanding the game of life hit saturation. The transformer lived in a state where it didn’t get everything right but it understood the game of life to a degree.

We can extrapolate that gradient to LLMs as well. LLMs are likely on that gradient, not yet at saturation. Either way, I think it’s safe to say that LLMs understand what they are talking about. It’s just that they haven’t hit saturation yet. There’s clearly things that we as humans understand better than the LLM.

But let’s extrapolate this concept to an even higher level:

Have we as humans hit saturation yet?

Philpax•10mo ago
It's a theoretical result to help determine what they're capable of, not a practical solution. Of course you can write the code yourself - but that's not the point!
lynndotpy•10mo ago
Well, you could also implement this by hand-writing weights for one convolution layer.

There are only 512 training examples needed for that, and it would be a lot more interesting if a learning algorithm were able to fit that 3x3 convolution layer from those 512 examples. IIRC, and don't quote me on that, but that's not been done.

evrimoztamur•10mo ago
I would like to point out a much more exciting modelling process, whereby neural networks extract the underlying boolean logic from simulation outputs: https://google-research.github.io/self-organising-systems/di...

I firmly believe that differentiable logic CA is the winner, in particular because it extracts the logic directly, and thus leads to generalize-able programs as opposed to staying stuck in matrix multiplication land.

max_•10mo ago
This is one of those papers that are so good I would like to keep it secret.

I shared it with a friend and he thought its wasn't that useful.

That made me happy since I knew my secret maybe safe.

_dark_matter_•10mo ago
I think it's interesting, but I don't see how it's useful. Can you describe what you think is useful about it?
Legend2440•10mo ago
If you look at the learned gates, it does not directly extract the underlying rules of Conway's game of life. It has many more gates than are necessary and they have the same complex, uninterpretable structure you see in a neural network.

The training method they're using is the same as used for quantized neural networks. Your 'neurons' being logic gates doesn't mean you're doing logic, it's still statistics.

awesomeMilou•10mo ago
To ruin this for everyone: The underlying optimization that enables these to run as computationally efficient as they do, is patented:

https://patents.google.com/patent/WO2023143707A1/en?inventor...

Nopoint2•10mo ago
I don't get the point. A simple CNN with stride =1 should be able to solve it perfectly and generalize it to any size.
montebicyclelo•10mo ago
It wasn't obvious that a transformer could do this, and learn to produce conv via attention
artemisart•10mo ago
But it is, as long as the positional embedding are sufficient, i.e. use relative positional embeddings here.
amelius•10mo ago
But can it condense it into a small program?
xchip•10mo ago
Even a simple regression will do that
montebicyclelo•10mo ago
it won't AFAIK, without some extra hand coded logic
Dwedit•10mo ago
Rip John Conway, died of Covid.
wrs•10mo ago
I was hoping for an explanation of, or some insight from, the loss curve. Training makes very little progress for a long time, then suddenly converges. In my (brief) experience with NN training, I typically see more rapid progress at the beginning, then a plateau of diminishing returns, not an S-curve like this.
montebicyclelo•10mo ago
Hmm, just my intuition: training this model was very sensitive to the initial seed and training hyperparameters. It struggles to actually get to the 3x3 conv solution; but once it gets close to that things move much more quickly. This can kind of be seen in the animation of the attention matrix over time, which starts off random / spread out, but then once it starts to get more parts of the attention matrix in place it moves quicker. (Assuming all the experimentation wasn't in some bad part of the hyperparameter space.)

Also, it may just be the nature of the task. Some tasks you might have more to learn, all the time, with each training sample potentially giving information that's different from all the others. But with this, once it gets close to the solution of Life, it's quick.