That being said, this isn’t a knockout blow by any stretch. The strength of LLMs lies in the people who are excited about them. And there’s a perfect reinforcing mechanism for the excitement - the chatbots that use the models.
Admit for a second that you’re a human with biases. If you see something more frequently, you’ll think it’s more important. If you feel good when doing something, you’ll feel good about that thing. If all your friends say something, you’re likely to adopt it as your own belief.
If you have a chatbot that can talk to you more coherently than anyone you’ve ever met, and implement these two nested loops that you’ve always struggled with, you’re poised to become a fan, an enthusiast. You start to believe.
And belief is power. As in the case of neuroscience development not being able to retire the concept of the dualism of body and soul, so will the testing of LLMs not be able to retire the concept of AI poised to dominate everything soon.
That could have unfortunate consequences. Most people stopped looking at neural nets for years because they thought that Minsky's and Papert's 1969 proof that perceptrons (linear neural nets) couldn't solve basic problems incorrectly applied to neural nets in general. So the field basically abandoned neural nets for a couple of decades which were more or less wasted on "symbolic" approaches to AI that accomplished little.
LLMs have a real issues with polarisation. It's probably smart people saying all this stuff about knockout blows, and LLM uselessness, but I find them really useful. Is there some emperor's new clothes type thing going on here - am I just a dumbass who can't see he's excited at a random noise generator?
It's like if I saw a headline about a knockout blow for cars because SomeBigBame discovered it's possible to crash them.
It wouldn't change my normal behaviour, it would just make me think "huh, I should avoid anything SomeBigName is doing with cars then if they only just realised that."
But the ivory tower misses the point of how LLM improved the ability of regular people to interact with information and technology.
While it might not be the grail they were seeking, it’s still a useful thing what will improve life and in turn be improved.
He's not always wrong, and sometimes useful as a contrarian foil, but not a source of much insight.
There is no evidence this is the case.
We could be in an era of diminishing returns where bigger models do not yield substantial improvements in quality but instead they become faster, cheaper and more resource efficient.
And no one cares what we may have in the future. OpenAI etc already have an issue with credibility.
The scaling laws themselves advertise diminishing returns, something like a natural log. This was never debated by AI optimists, so it's odd to suggest otherwise as if it contradicts anything the AI optimists have been saying.
The scaling laws are kind of a worst case scenario, anyway. They assume no paradigm shift in methodology. As we saw when the test-time scaling law was discovered, you can't bet on stasis here.
so the AI companies really took that to heart and tried to put everything into the training distribution. My stuff, your stuff, their stuff. I remember the good old days of feeding the wikimedia dump into a markov chain.
i get the criticism, but "on the ground" there's real stuff getting done that couldn't be done before. all of this boils down to an intellectual study which, while good to know, is meaningless in the long run. the only thing that matters is if the dollars put in can be recouped to the level of hype created and that answer is probably "maybe" in some areas but not others.
this AI doomerism is getting just as annoying as people claiming AI will replace everyone and everything.
You aren't winning anything by saying "aha! I told you they are useless!" because they demonstrably aren't.
Yes everybody is hoping that someone will come up with a better algorithm that solves these problems but until they do it's a little like complaining about the invention of the railway because it can only go on tracks while humans can go pretty much anywhere.
You said this. Neither Apple nor the author did.
The focus was specifically on LLM's reasoning capabilities not whether they are entirely useless or not.
This is relevant because countless startups and investment is predicated on LLM's current capabilities being able to be improved and built on top of. If it is a technological dead-end then we could be in for another long lull in progress. And companies like OpenAI should have their valuations massively cut.
It also constrains the level of investment Apple would need to be comparable to top tier LLM companies.
There is a belief being peddled that AGI is right around the corner and we can get there by just scaling up LLMs.
Papers like this are a good takedown of that thinking
This paper from last year doesn't age well due to rapid proliferation of reasoning models.
While everyone learned the bitter lesson, apple chose to focus on small on-device models even after the explosion of chatgpt.
"See this is why we can't build with transformers and had to use JEPA and look how much better it is!"
There's a recent talk about this by Jim Fan from Nvidia https://youtu.be/_2NijXqBESI
> Many (not all) humans screw up on versions of the Tower of Hanoi with 8 discs.
> LLMs are no substitute for good well-specified conventional algorithms.
> will continue have their uses, especially for coding and brainstorming and writing
> But anybody who thinks LLMs are a direct route to the sort AGI that could fundamentally transform society for the good is kidding themselves.
I agree with the assessment but disagree with the conclusion:
Being good at coding, writing, etc is precisely the sort of labor that is both “general intelligence” and will radically change society when clerical jobs are mechanized — and their ability to write (and interface with) classical algorithms to buttress their performance will only improve.
This is like when machines came for artisans.
The authors speculate that this pattern is a consequence of reasoning models actually solving these puzzles by way of pattern-matching to training data, which covers some puzzles at greater depth than others.
Great. That's one possible explanation. How might you support it?
- You could systematically examine the training data, to see if less representation of a puzzle type there reliably correlates with worse LLM performance.
- You could test how successfully LLMs can play novel games that have no representation in the training data, given instructions.
- Ultimately, using mechanistic interpretability techniques, you could look at what's actually going on inside a reasoning model.
This paper, however, doesn't attempt any of these. People are getting way out ahead of the evidence in accepting its speculation as fact.
Playbook:
1) you want to "disprove" some version of AI. Doesn't really matter what.
Take a problem humans face. For example, an almost total inability to follow simple rules for a long time to make a calculation. It's almost impossible to get a human to do this.
Check if AI algorithms, which are algorithms made to imitate humans have this same problem. Now of course, in practice if they indeed have that problem, that is actually a success: algorithm made to imitate humans ... imitates humans succesfully, strengths and weaknesses! But of course, if you find it, you describe it as total proof this algorithm is worthless.
An easy source for these problems is of course computers. Anything humans use computers for ... it's because humans suck at doing it themselves. Keeping track of history or facts. Exact calculation. Symbolic computation. Logic (ie. exactly correct answers). More generally math and even positive sciences as a whole are an endless supply of such problems.
2) you want to "prove" some version of AI.
Find something humans are good at. Point out AIs do this. Humans are social animals so how about influencing other humans? From convincing your boss, or on a larger scale using a social network to win an election, right up to actual seduction. Use what humans use to do it, of course (ie. be inaccurate, lie, ...)
Point out what a great success this is. How magical it is that machines can now do this.
3) you want to make a boatload of money
Take something humans are good at but hate, have an AI do it for money.
feketegy•8h ago
I don't think anybody who uses LLMs professionally day-to-day thinks that it can reason like human beings... If some people thought this, they fundamentally do not understand how LLMs work under the hood.
openplatypus•8h ago
IshKebab•7h ago
The internet and smartphones were still extremely useful. There's no need to refute VC exaggeration. It's like writing articles to prove that perfume won't catch you Bradd Pitt. Nobody literally believes the adverts but that doesn't mean perfume is a lie.
I'm not saying that VCs only push good ideas - e.g. flying cars & web3 aren't going to work. Just that their claims are obviously exaggerated and can be ignored, even for useful ideas.
simonklitj•8h ago
Lerc•7h ago
I'm ok with thinking it's possible that some subset of consciousness might exist in LLMs while also being well aware of their limitations. Cognitive science has plenty of examples of mental impairments that show that there are individuals that lack some of the things that LLMs also lack that. We would hardly deny those individuals are conscious. The distinction for what is thought is lower but no less complex.
Before we had machines pushing at these boundaries, there were very learned people debating these issues, but it seems like now some gut instincts from people who have chatted to a bot for a bit are carrying the day.