The M or B game breaks down when you play with someone who knows obscure people you've never heard of. Either you can't recognize their references, or your sense of "semantic distance" differs from theirs. The solution is to match knowledge levels: experts play with experts, generalists with generalists.
The same applies to decoding ancient texts, if ancient civilizations focused on completely different concepts than we do today, our modern semantic models won't help us understand their writing.
It just assumes that your answers are going to be reasonably bread-like or reasonably mussolini-like, and doesn't think laterally at all.
It just kept asking me about varieties of baked goods.
edit: It did much better after I added some extra explanation -- that it could be anything that it may be very unlike either choice, and not to try and narrow down too quickly
If you used word2vec directly it's the exact right thing to play this game with. Those embeddings exist in an LLM but it's trained to respond like text found online not play this game.
1. Not all models are equally efficient. We already have many methods to perform universal search (e.g., Levin's, Hutter's, and Schmidhuber's versions), but they are painfully slow despite being optimal in a narrow sense that doesn't extrapolate well to real world performance.
2. Solomonoff induction is only optimal for infinite data (i.e., it can be used to create a predictor that asymptotically dominates any other algorithmic predictor). As far as I can tell, the problem remains totally unsolved for finite data, due to the additive constant that results from the question: which universal model of computation should be applied to finite data? You can easily construct a Turing machine that is universal and perfectly reproduces the training data, yet nevertheless dramatically fails to generalize. No one has made a strong case for any specific natural prior over universal Turing machines (and if you try to define some measure to quantify the "size" of a Turing machine you realize this method starts to fail once the number of transition tables becomes large enough to start exhibiting redundancy).
> One explanation for why this game works is that there is only one way in which things are related
There is not, this is a completely non transitive relationship.
On another point, suppose you keep the same vocabulary, but permute the signification of the words, the neural network will still learn relationships, completely different ones, but it's representation may converge toward a better compression for that set of words, but I'm dubious that this new compression scheme will ressemble the previous one (?)
I would say that given an optimal encoding of the relationships, we can achieve an extreme compression, but not all encodings lead to the same compression at the end.
If I add 'bla' between every words in a text, that is easy to compress, but now, if I add an increasing sequence of words between each words, the meaning is still there, but the compression will not be the same, as the network will try to generate the words in-between.
(thinking out loud)
tyronehed•1h ago
When we arrive at AGI, you can be certain it will not contain a Transformer.
jxmorris12•1h ago
I once saw a LessWrong post claiming that the Platonic Representation Hypothesis doesn't hold when you only embed random noise, as opposed to natural images: http://lesswrong.com/posts/Su2pg7iwBM55yjQdt/exploring-the-p...
blibble•25m ago
of course it matters
if I supply the ants in my garden with instructions on how to build tanks and stealth bombers they're still not going to be able to conquer my front room