Thesis: language isn't a great representation, basically.
I really should apply myself. Maybe I wouldn't work so hard, just shuck nonsense/pontificate.
He has no problems pimping his credentials and shitting on other people's work and lying through his teeth to enrich himself. He's obviously intelligent enough to know better, but he's a singularly intellectually dishonest figure.
He's a one man version of The Enquirer or Zergnet for AI, and thrives entirely on dishonest takes and divisive commentary, subsiding on pure clickbait. There is absolutely no reason to regard anything he says with any level of seriousness or credulity, he's an unprincipled jackass cashing out unearned regard by grifting and shilling, loudly.
If you must, here's an archived link, don't reward him with clicks.
He really shouldn't end up on the top ten of HN, let alone the front page. It's like an SEO hack boosting some guy proudly documenting pictures of his bowel movements.
> Yann LeCun was first, fully coming around to his own, very similar critique of LLMs by end of 2022.
> The Nobel Laureate and Google DeepMind CEO Sir Demis Hssabis sees it now, too.
He's personally moved on from LLM and exploring new architecture more built around world models.
Which he describes here: https://x.com/ylecun/status/1759933365241921817
Also I think the 2022 quoted refers to this Paper by Yann: https://openreview.net/pdf?id=BZ5a1r-kVsf
Sutton... the patron saint of scaling...
Listen to people for the their ideas, not their label.
Regardless, Marcus is a bit late to comment on the bitter lesson. That is so 6 months ago lol
> The bitter lesson is based on the historical observations that 1) AI researchers have often tried to build knowledge into their agents, 2) this always helps in the short term, and is personally satisfying to the researcher, but 3) in the long run it plateaus and even inhibits further progress, and 4) breakthrough progress eventually arrives by an opposing approach based on scaling computation by search and learning. The eventual success is tinged with bitterness, and often incompletely digested, because it is success over a favored, human-centric approach.
You could easily seem most LLM work as a dead end because it is about 'building knowledge into your agents' (eg. by paying data labelers billions of dollars total to supplement your scrapes), and not about 'search' (still a major open problem for LLMs - o1-style serial reasoning traces are obviously inadequate) or 'learning' (LLMs depend so heavily on the knowledge already encoded in so much data for them).
It's... it's over. The west has fallen.
What he has done is continually move a goalpost to stay somewhat relevant in the blogsphere and presumably the academic world.
"We want AI agents that can discover like we can, not which contain what we have discovered. Building in our discoveries only makes it harder to see how the discovering process can be done." - I am curious if people would read this as an advocacy or criticism of LLMs?
But zooming out, LLMs are universal approximators, so it's trivially true that they can approximate any function that describes AGI. It's also true that logic (from logos or "word") is about reasoning constrained by language and conversations. So an LLM is the right sort of device you'd expect to achieve general intelligence.
There are arguably non-linguistic forms of intelligence, such as visual intelligence. But those also can operate on written symbols (e.g. the stream of bits from an image file).
The other relevant question is why does Gary Macus always seem so angry? It's draining reading one of his posts.
I very naively assume the "easy" path will be similar: a very different system that's bolted on/references the foundation models, to enable the realtime/novel reasoning (outside the fixed latent space) bit that isn't possible now.
[1] https://animalcare.umich.edu/our-impact/our-impact-monitorin...
His stance on LLMs can be modeled by a simple finite state machine:
State 1) LLM performance stalls for a couple of months: - "See I told you, LLMS are a dead end and won't work!"
State 2) New LLM release makes rapid and impressive improvements - "AI is moving too fast! This is dangerous and we need the government to limit the labs to slow them down!"
Repeat
If you have a high conviction belief that is countervailing to the mainstream, you suffer a great deal. Even the most average conversation with a “mainstream believer” can turn into a judgment fest. Sometimes people stop talking to you mid-conversation. Investors quietly remove you from their lead lists. Candidates watch your talks and go dark on you. People with no technical expertise lecture at you.
Yet, inevitably, a fraction of such people carry forward. They don’t shut up. And they are the spoon that stirs the pot of science.
It’s totally normal and inevitable that people start to take victory laps at even the smallest indication in such situations. It doesn’t mean they’re right. It is just not something worth criticizing.
It's easy to be a faux contrarian that just always says we're in a bubble or x is overhyped. Everyone knows that, it's the nature of markets and not an insight. The only value is having some actual insight into where and how things stop and where there is some upside. Otherwise you're just a jealous loser.
Marcus claims to have reread The Bitter Lesson. And I should say, I too have reread the text and I don't think Marcus is getting the actual original argument of here. All it say is that general purpose algorithms that scale will outperform special purpose algorithms that use information about the problem and don't scale. That's all. Everyone claiming more is hallucinating, things into this basic point. Notably general purpose algorithms aren't necessary neural nets and "X works better than Y" doesn't imply X is the best thing every.
So there's no contradiction between The Bitter Lesson and claims that LLMs have big hole and/or won't scale up to AGI.
You could, in theory, represent that model as a linear stream of tokens, and provide it as context to an LLM directly. It would be an absurdly wasteful number of tokens, at minimum, and the attention-esque algorithm for how someone might "skim" that model given a structured query would be very different from how we skim over text, or image patches, or other things we represent in the token stream of typical multi-modal LLMs.
But could it instead be something that we provide as a tool to LLMs, and use an LLM as the reasoning system to generate structured commands that interact with it? I would wager that anyone who's read a book, drawn a map of the fantasy world within, and argued about that map's validity on the internet, would consider this a viable path.
At the end of the day, I think that the notion of a "pure LLM" is somewhat pedantic, because the very term LLM encapsulates our capability of "gluing" unstructured text to other arbitrary tools and models. Did we ever expect to tie our hands behind our back and make it so those arbitrary tools and models aren't allowed to maintain state? And if they can maintain state, then they can maintain the world model, and let the LLM apply the "bitter lesson" that compute always wins, on how to best interact with and update that state.
Legend2440•1h ago
It doesn’t matter what incredible things neural networks do, in his mind they’re always a dead end that will collapse any day now.
jonplackett•1h ago
I have no idea if LLMs will be general AI, but they defo aren’t going anywhere
Anon1096•1h ago
andy99•1h ago
(I notice in retrospect it does show that it's a link to his substack so I guess that's sufficient warning, I didn't see it)
jgalt212•1h ago
grafmax•41m ago
Yizahi•31m ago
Just because something is a case of "old man yelling at clouds" doesn't mean that underlying logic is always wrong. Sometimes markets can be irrational longer than we expect.