frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

Transformer neural net learns to run Conway's Game of Life just from examples

https://sidsite.com/posts/life-transformer/
37•montebicyclelo•4h ago

Comments

bonzini•2h ago
Do I understand correctly that it's brute forcing a small grid rather than learning the algorithm?
montebicyclelo•1h ago
> it's brute forcing a small grid

If by small grid you are referring to the attention matrix plot shown, then that is not a correct interpretation. That diagonal-like pattern it learns, is 3x3 convolution, so it can compare the neighbours of a given cell.

Edit: and note that every grid it is trained on / runs inference on is randomly generated and completely unique, so it cannot just memorise examples

bonzini•1h ago
My interpretation is that while it did learn the exact computation and not just a statistical approximation, it's still limited to a grid of a given size. In that sense the attention matrix is brute forced and the network did not learn a generalization. The article itself says "The largest grid size we successfully trained was 16x16".
yorwba•1h ago
They're using learned positional embeddings for each grid cell, so there's no straightforward way to extend a model trained on a small grid to a larger grid. If you go from large to small, I think it would do better than chance, but get the periodic boundary condition wrong, because the period changes with the grid size.

Using 2D RoPE instead would in principle allow scaling up as well, and maybe even period detection if you train it across a range of grids, but would eventually hit the same issues that plague long-context scaling in LLMs.

bernb•46m ago
Yes, it does not understand (or has learned) the rules of the game then. For that being the case, it should be able to apply the rules correctly in a slightly different context.

Would it be possible to train an LLM on the rules how we would teach them to a human?

constantcrying•2h ago
To be honest an unsurprising result.

But I think the paper fails to answer the most important question. It alleges that this isn't a statistical model: "it is not a statistical model that predicts the most likely next state based on all the examples it has been trained on.

We observe that it learns to use its attention mechanism to compute 3x3 convolutions — 3x3 convolutions are a common way to implement the Game of Life, since it can be used to count the neighbours of a cell, which is used to decide whether the cell lives or dies."

But it is never actually shown that this is the case. It later on isn't even alleged that this is true, rather the metric they use is that it gives the correct answers often enough, as a test for convergence and not that the net has converged to values which give the correct algorithm.

But there is no guarantee that it actually has learned the game. There are still learned parameters and the paper doesn't investigate if these parameters actually have converged to something where the Net is actually just a computation of the algorithm. The most interesting question is left unanswered.

Y_Y•2h ago
Reminds me of this great story about a programmer-turned-businessman who tried to learn a game from examples and ended up with an almost-correct brute force solution:

https://www.borrett.id.au/computing/petals-bg.htm

montebicyclelo•1h ago
The diagonal-looking attention matrix shown in the post is mathematically equivalent to 3 by 3 convolution. The model learns how to do that via its attention mechanism - it's not obvious that it would be able to do that via attention.

(This can be shown by comparing that attention matrix to a "manually computed Neighbour Attention matrix", which is known to be equivalent to 3 by 3 conv.)

eapriv•2h ago
Great, we can spend crazy amount of computational resources and hand-holding in order to (maybe) reproduce three lines of code.
zelphirkalt•1h ago
Exactly my thoughts. This is not useful at all. We already know how to write exact and correct code to implement that. This is no task that we should throw ANNs at.
gessha•37m ago
Basic research has non-obvious utility and it deserves its own spotlight.

It’s similar to comparing hardware radio and software-defined radio: Yes, we already know how to build a radio with hardware but a software-defined one offers greater flexibility.

ninetyninenine•1h ago
The significance of this is that we can fully understand this problem because it’s only 3 lines of code.

Like for learning the English language we don’t fully understand the way LLMs work. We can’t fully characterize it. So we have debates on whether the LLM actually understands English or understands what it’s talking about. We simply don’t know.

The results of this show that the transformer understands the game of life. Or whatever the transformer does with the rules of the game of life it’s safe to say that it fits a definition of understanding as mankind knows it.

Like much of machine learning where we use the abstraction of curve fitting to understand higher dimensional learning we can do the same extrapolation here.

If the transformer understands the game of life then that understanding must translate over to the LLM. The LLM understands English and understands the contents of what it is talking about.

There was a clear gradient of understanding before understanding the game of life hit saturation. The transformer lived in a state where it didn’t get everything right but it understood the game of life to a degree.

We can extrapolate that gradient to LLMs as well. LLMs are likely on that gradient, not yet at saturation. Either way, I think it’s safe to say that LLMs understand what they are talking about. It’s just that they haven’t hit saturation yet. There’s clearly things that we as humans understand better than the LLM.

But let’s extrapolate this concept to an even higher level:

Have we as humans hit saturation yet?

Philpax•1h ago
It's a theoretical result to help determine what they're capable of, not a practical solution. Of course you can write the code yourself - but that's not the point!
lynndotpy•1h ago
Well, you could also implement this by hand-writing weights for one convolution layer.

There are only 512 training examples needed for that, and it would be a lot more interesting if a learning algorithm were able to fit that 3x3 convolution layer from those 512 examples. IIRC, and don't quote me on that, but that's not been done.

evrimoztamur•1h ago
I would like to point out a much more exciting modelling process, whereby neural networks extract the underlying boolean logic from simulation outputs: https://google-research.github.io/self-organising-systems/di...

I firmly believe that differentiable logic CA is the winner, in particular because it extracts the logic directly, and thus leads to generalize-able programs as opposed to staying stuck in matrix multiplication land.

Nopoint2•45m ago
I don't get the point. A simple CNN with stride =1 should be able to solve it perfectly and generalize it to any size.
montebicyclelo•37m ago
It wasn't obvious that a transformer could do this, and learn to produce conv via attention
amelius•43m ago
But can it condense it into a small program?
xchip•38m ago
Even a simple regression will do that
montebicyclelo•32m ago
it won't AFAIK, without some extra hand coded logic
Dwedit•22m ago
Rip John Conway, died of Covid.

Oracle VM VirtualBox – VM Escape via VGA Device

https://github.com/google/security-research/security/advisories/GHSA-qx2m-rcpc-v43v
50•serhack_•2d ago•29 comments

JavaScript's New Superpower: Explicit Resource Management

https://v8.dev/features/explicit-resource-management
151•olalonde•8h ago•95 comments

Japan's IC cards are weird and wonderful

https://aruarian.dance/blog/japan-ic-cards/
128•aecsocket•2d ago•88 comments

Popcorn: Run Elixir in WASM

https://popcorn.swmansion.com/
63•clessg•1d ago•3 comments

Wow@Home – Network of Amateur Radio Telescopes

https://phl.upr.edu/wow/outreach
142•visviva•11h ago•14 comments

Getting AI to write good SQL

https://cloud.google.com/blog/products/databases/techniques-for-improving-text-to-sql
397•richards•16h ago•234 comments

Implementing a RISC-V Hypervisor

https://seiya.me/blog/riscv-hypervisor
37•ingve•5h ago•0 comments

XTool – Cross-platform Xcode replacement

https://github.com/xtool-org/xtool
143•TheWiggles•11h ago•36 comments

Catalog of Novel Operating Systems

https://github.com/prathyvsh/os-catalog
58•prathyvsh•6h ago•15 comments

Laser-Induced Graphene from Commercial Inks and Dyes

https://advanced.onlinelibrary.wiley.com/doi/10.1002/advs.202412167
3•PaulHoule•2d ago•0 comments

A kernel developer plays with Home Assistant

https://lwn.net/SubscriberLink/1017720/7155ecb9602e9ef2/
126•pabs3•10h ago•49 comments

Open Problems in Computational geometry

https://topp.openproblem.net/
20•nill0•3h ago•0 comments

Push Ifs Up and Fors Down

https://matklad.github.io/2023/11/15/push-ifs-up-and-fors-down.html
16•goranmoomin•4h ago•2 comments

Thoughts on thinking

https://dcurt.is/thinking
525•bradgessler•18h ago•337 comments

A Research Preview of Codex

https://openai.com/index/introducing-codex/
448•meetpateltech•22h ago•384 comments

You do not need NixOS on the desktop

https://aruarian.dance/blog/you-do-not-need-nixos/
27•transpute•4h ago•6 comments

New high-quality hash measures 71GB/s on M4

https://github.com/Nicoshev/rapidhash
90•nicoshev11•3d ago•35 comments

MIT asks arXiv to withdraw preprint of paper on AI and scientific discovery

https://economics.mit.edu/news/assuring-accurate-research-record
334•carabiner•22h ago•174 comments

The Japanese method of creating forests comes to Mexico

https://english.elpais.com/climate/2025-05-17/miyawaki-in-nezahualcoyotl-the-japanese-method-of-creating-forests-comes-to-mexico.html
5•geox•41m ago•0 comments

Rustls Server-Side Performance

https://www.memorysafety.org/blog/rustls-server-perf/
129•jaas•4d ago•39 comments

Transformer neural net learns to run Conway's Game of Life just from examples

https://sidsite.com/posts/life-transformer/
37•montebicyclelo•4h ago•21 comments

I'm Peter Roberts, immigration attorney, who does work for YC and startups. AMA

227•proberts•22h ago•392 comments

Why Moderna Merged Its Tech and HR Departments

https://www.wsj.com/articles/why-moderna-merged-its-tech-and-hr-departments-95318c2a
20•andy99•3d ago•22 comments

IM-2's Imperfect Landing Due to Altimeter Interference

https://spacepolicyonline.com/news/im-2s-imperfect-landing-due-to-altimeter-interference-south-pole-lighting-conditions/
5•verzali•2d ago•1 comments

MCP: An in-depth introduction

https://www.speakeasy.com/mcp/mcp-tutorial
105•ritzaco•4d ago•42 comments

How can traditional British TV survive the US streaming giants

https://www.bbc.co.uk/news/articles/cx2enydkew3o
40•asplake•3d ago•118 comments

Show HN: Merliot – plugging physical devices into LLMs

https://github.com/merliot/hub
57•sfeldma•12h ago•12 comments

Show HN: Fahmatrix – A Lightweight, Pandas-Like DataFrame Library for Java

https://github.com/moustafa-nasr/fahmatrix
32•mousomashakel•8h ago•4 comments

ClojureScript 1.12.42

https://clojurescript.org/news/2025-05-16-release
167•Borkdude•17h ago•32 comments

Show HN: Visual flow-based programming for Erlang, inspired by Node-RED

https://github.com/gorenje/erlang-red
232•Towaway69•22h ago•95 comments