A car is not just an engine, it's a drivetrain, a transmission, wheels, steering, all of which affect the end-product and its usability. LLMs are no different, and focusing on alignment without even addressing all the scaffolding that intermediates the exchange between the user and the LLM in an assistant use case seems disingenuous.
I think it's the other way round: humans have effectively unbounded training data. We can count exactly how much text any given model saw during training. We know exactly how many images or video frames were used to train it, and so on. Can we count the amount of input humans receive?
I can look at my coffee mug from any angle I want, I can feel it in my hands, I can sniff it, lick it and fiddle with it as much as I want. What happens if I move it away from me? Can I turn it this way, can I lift it up? What does it feel like to drink from this cup? What does it feel like when someone else drinks from my cup? The LLM has no idea because it doesn't have access to sensory data and it can't manipulate real-life objects (yet).
Humans ship with all the priors evolution has managed to cram into them. LLMs have to rediscover all of it from scratch just by looking at an awful lot of data.
Intelligence likely doesn’t require that much data, and it may be more a question of evolutionary chance. After all, human intelligence is largely (if not exclusively) the result of natural selection from random mutations, with a generation count that’s likely smaller than the number of training iterations of LLMs. We haven’t found a way yet to artificially develop a digital equivalent effectively, and the way we are training neural networks might actually be a dead end here.
Which gives us no information on computational complexity of running that algorithm, or on what it does exactly. Only that it's small.
LLMs don't get that algorithm, so they have to discover certain things the hard way.
I think this would depend entirely on how the sensory impairment came about, since most genetic problems are not isolated, but carry a bunch of other related problems (all of which can impact intelligence).
Lose your eye sight in an accident? I would grant there is likely no difference on average.
Otherwise, the null hypothesis is that intelligence (and a whole host of other problems) are likely worse, on average.
This is clearly untrue. All information a human ever receives is through sensory data. Unless your position is that the intelligence of a brain that was grown in a vat with no inputs would be equivalent to that of a normal person.
Now, does rotating a coffee mug and feeling its weight, seeing it from different angles, etc. improve intelligence? Actually, still yes, if your intelligence test happens to include questions like “is this a picture of a mug” or “which of these objects is closest in weight to a mug”.
Entirely possible - we just don’t know. The closest thing we have to a real world case study is Helen Keller and other people with significant sensory impairments, who are demonstrably unimpaired in a general cognitive sense, and in many cases more cognitively capable than the average unimpaired person.
We get a lot of high quality data that's relatively the same. We run the same routines every day, doing more or less the same things, which makes us extremely reliable at what we do but not very worldly.
LLMs get the opposite: sparse, relatively low quality, low modality data that's extremely varied, so they have a much wider breadth of knowledge but they're pretty fragile in comparison since they get relatively little experience on each topic and usually no chance to affirm learning with RL.
That said, LLMs are still trained on significantly more data pretty much no matter how you look at it. E.g. a blind child might hear 10-15 million words by age 6 vs. trillions for LLMs.
A camera hooked up to the baby's head is absolutely not getting all the input data the baby gets. It's not even getting most of it.
The acquired knowledge is a lot less uniform than you’re proposing and in fact is full of gaps a human would never make. And more critically, it is not able to peer into all of its vast knowledge at once, so with every prompt what you get is closer to an “instance of a human” than “all of humanity” as you might think of LLMs.
(I train and dissect LLMs for a living and for fun)
But as we saw over the course of recent months or years, AI outputs are becoming more indistinguishable for human output.
I would bet completely against this, models are becoming more human-like, not less, over time.
What's more likely to change (that would cause a difference) is the work itself changing to adapt to areas where models are already super-human, such as being able to read entire novels in seconds with full attention.
But as we interact with other people using mostly language, and since the start of internet a lot of those interactions happen in way similar to how we interact with AI, the difference is not so obvious. We are falling into the Turing test in this, mostly because that test is more about language than about intelligence.
Before 2022 (most of history), if you had a long seemingly sensible conversation with something, you could safely assume this other party was a real thinking human mind.
it's like a duck call.
edit, i want to add because this is neural net that's trained to output sensible text, language isn't just the interface.
There's no separation between anything.
And I am sorry to be negative but there is so much bad cognitive science in this article that I couldn't take the product seriously.
> LLMs can be scaled almost arbitrarily in ways biological brains cannot: more parameters, more training compute, more depth.
- Capacity of raw compute is irrelevant without mentioning the complexity of computation task at hand. LLM's can scale - not infinitely - but they solve for O(n^2) tasks. It is also amiss to think human compute = a singular human's head. Language itself is both a tool and protocol of distributed compute among humans. You borrow a lot of your symbolic preprocessing from culture! Like said, this is exactly what LLM's piggyback on.
> We are constantly hit with a large, continuous stream of sensory input, but we cannot process or store more than a very small part of it.
- This is called relevance, and we are so frigging good at it! The fact that machine has to deal with a lot more unprioritized data in a relatively flat O(n^2) problem formulation is a shortcoming, not a feature. Visual cortex is such an opinionated accelerator of processing all that massive data that only the relevant bits need to make to your consciousness. And this architecture was trained for hundreds of millions of years, over trillions of experiment arms - that were in parallel experimenting on everything else too.
> Humans often have to act quickly. Deliberation is slow, so many decisions rely on fast, heuristic processing. In many situations (danger, social interaction, physical movement), waiting for more evidence simply isn't an option.
- Again a lot of this equivocates conscious processing to entire cognition. Anyone who plays sports or music knows to respect the implicit, embodied cognition that goes on to achieve complex motor tasks. We are yet to see a non-massively-fast-forwarded household robot do a mundane kitchen cleaning task, and go play table tennis with the same motor "cortex". Motor planning and articulation is a fantastically complex computation; just because it doesn't make it to our consciousness or instrumented exclusively through language doesn't mean it is not.
> Human thinking works in a slow, step-by-step way. We pay attention to only a few things at a time, and our memory is limited.
- Thinking, Fast and Slow by Kahneman is a fantastic way of getting into how much more complex the mechanism is.
The key point here is as limited in their recall, how good humans are at relevance, because it matters, because it is existential. Therefore when you are using a tool to extend your recall, it is important to see its limitations. Google search having indexed billions of pages is not a feature if it can't bring the top results well. If it gets the capability to sell me whatever it brought up was relevant, that still doesn't mean the results are actually relevant. And this is exactly the degradation of relevance we are seeing in our culture.
I don't care if the language terminal is a human or a machine, if the human was convinced by the low relevance crap of the machine it just a legitimacy laundering scheme. Therefore this is not a tech problem, it is a problem of culture; we need to be simultaneously cultivating epistemic humility, including quitting the Cartesian tyranny of worshipping explicit verbal cognition that is assumed to be locked up in a brain; we have to accept that we are also embodied and social beings that depend on a lot of distributed compute to solve for agency.
somewhereoutth•2h ago
bitwize•1h ago
skydhash•1h ago
cloflaw•1h ago
This is literally their inhumanness.