Given the same fundamentals, such as transformer architecture networks, then multiple models given data about the same world are going to converge on representation as a matter of course. They're going to diverge if the underlying manner in which data gets memorized and encoded, such as with RNNs, like RWKV.
The interesting bits should be the convergence of representation between human brains and transformer models, or brains and RWKV, because the data humans collect is implicitly framed by human cognitive systems and sensors.
The words and qualia and principles we use in thinking about things and communicating and recording data are going to anchor all data in a fundamental ontological way that is inescapable, and therefore it's going to constrain the manner in which higher order extrapolations and derivations can be structured, and those structures are going to overlap with human constructs.
in-silico•13h ago
> They're going to diverge if the underlying manner in which data gets memorized and encoded, such as with RNNs, like RWKV.
In the original paper (https://arxiv.org/abs/2405.07987) the authors also compared the representations of transformer-based LLMs to convolution-based image models. They found just as much alignment between them as when both models were transformers.
observationist•13h ago
Very interesting - the human bias implicit to the structure of the data we collect might be critical, but I suspect there's probably a great number theory paper somewhere in there that validates the Platonic Representation idea.
How would you correct for something like "the subset of information humans perceive and find interesting" versus "the set of all information available about a thing that isn't noise" and determine what impact the selection of the subset has on the structure of things learned by AI architectures? You'd need to account for optimizers, architecture, training data, and so on, but the results from those papers are pretty compelling.
cyanydeez•10h ago
There's no way the human mind converges with current tech because there's a huge gap in wattage.
Obviously you could argue something about breadth of knowledge but there's no way setting up the current models can be processing the same as the human brain.
observationist•14h ago
The interesting bits should be the convergence of representation between human brains and transformer models, or brains and RWKV, because the data humans collect is implicitly framed by human cognitive systems and sensors.
The words and qualia and principles we use in thinking about things and communicating and recording data are going to anchor all data in a fundamental ontological way that is inescapable, and therefore it's going to constrain the manner in which higher order extrapolations and derivations can be structured, and those structures are going to overlap with human constructs.
in-silico•13h ago
In the original paper (https://arxiv.org/abs/2405.07987) the authors also compared the representations of transformer-based LLMs to convolution-based image models. They found just as much alignment between them as when both models were transformers.
observationist•13h ago
How would you correct for something like "the subset of information humans perceive and find interesting" versus "the set of all information available about a thing that isn't noise" and determine what impact the selection of the subset has on the structure of things learned by AI architectures? You'd need to account for optimizers, architecture, training data, and so on, but the results from those papers are pretty compelling.
cyanydeez•10h ago
Human brain is about 12 watts: https://www.scientificamerican.com/article/thinking-hard-cal...
Obviously you could argue something about breadth of knowledge but there's no way setting up the current models can be processing the same as the human brain.