Everything else, yeah, he's right, and I never doubted. I agree LLMs are unreliable, insecure etc. But I don't deduce from that that they're gonna amount to nothing.
Have there been specific claims about when it’s going to crash? I find it hard to believe he claimed it was all going to crash by early 2026. Maybe I’m wrong, I haven’t read all of his posts. But neither did the author, they admit in the repo this is all LLM, nothing was verified by humans.
From my perspective, it is basically guaranteed that LLMs will increasingly be seen as essential work tools for just about anyone doing knowledge work. So they won't amount to nothing.
But it is not at all guaranteed that the frontier model companies who are currently burning billions of dollars chasing that will capture significant percentages of that value.
In other words, could be all slop. Or maybe it’s not. Maybe it’s mixed. No one knows.
So you’re asking me to do the work you should have done in the first place? If you didn’t put any effort into it, why should I waste my time checking your non-work and correcting it to your credit?
If you had actually put in the effort then sure, I’d be amenable to helping making this the best it can be. But you didn’t, so what’s the point? Why should anyone spend their time fixing other people’s slop?
He has no idea what coding agents are capable of or how useful they are; he doesn't pay attention to any of the contributions to math or science that these models are making; he continually assists that because agents aren't ready to face customers in uncontrolled environment, they're completely useless even for employees and workers; he just last year posted an article complaining that LLMs don't use web search to find information (he asked the information about a friend), when almost all of them do now, even in their default interfaces; he still thinks hallucinations are a problem with any weight in things like mathematics and programming where it's very easy to verify the types of things hallucinations would cause a problem with; I think he still adheres to the stochastic parrot mindset even though that's not even the most relevant part of their training anymore.
Most importantly, although he seems to have made a single substack post making this argument, it doesn't seem to have really percolated through the rest of his thinking: that the cutting edge of LLMs right now, agents, are actually exactly the kind of neurosymbolic system, where neural networks provide an interface with the outside world and a creativity and problem-solving engine to provide the sort of fuzzy pattern matching and adaptability that is needed, while symbolic code-based systems ensure that guardrails are met, requirements are met, and for accurate information is provided and so on, that he wants. I think his objection might be that the problem is that the problem solving and reasoning engine at the core is still an LLM. But the thing is that you need the kind of pattern matching and flexibility and adaptability that you get from an LLM drive things, to have the end result be anything different than just an expert system with a slightly better natural language interface pasted on. And I think it's pretty clear at this point that expert systems are dead. They haven't done anything as remotely interesting or useful as what we're seeing LLMs do.
I think like another commenter says that his whole stick is pointing out obviously true basic features of LLMs like that they hallucinate or don't perfectly adhere to prompt guardrails, or that there's too much hype in the industry right now, and a lot of the companies suck in a vaguely standard big tech Silicon Valley way, and extrapolating to some broader point, which is that everyone should have listened to him and done what he said when he wrote that book back in the 90s (iirc).
I'm not really sure at that point what 'actual' AI means?
It seems like the definition of actual AI is something like perfect AI — it has to be fully observable, interpretable, reason perfectly, have perfect factual recall, continual learning, infinite context windows, perfect instruction following, and so on. I feel like at that point, maybe nothing could ever be 'actual' AI?
We typically use AI to mean some kind of algorithm or program that lets computers do intellectual work that was previously considered to be the exclusive domain of humans, especially if it involves problem solving or pattern matching or reasoning. Just look at Donald Knuth's recent posts about what Claude was able to do — seems like AI to me?
Yeah, it is in perfect AI, but it's still AI. And it's not clear to me that the imperfections that LLMs have mean that they can't be extremely useful and revolutionary as a form of AI. Yes, they make weird mistakes a lot, and they don't think at all like humans do. But I am of the opinion that there are a lot of forms of intelligence, and human intelligence is just one of them. And every kind of intelligence comes with its own different gamut of continual errors that it will tend to make, blind spots and biases. The fact that LLMs have issues that are different from the form of intelligence humans have and also different from what computers have issues with doesn't discount them from being intelligent to me.
I also think the framing of agentic harnesses as being bolted onto LLM's in order to "make them useful", but agentic harness plus LLMs not counting as an AI system itself very odd — I think it's pretty clear to me at least that "the AI", if you want to talk about it, is the neurosymbolic cybernetic feedback system that combines the harness and the LLM.
The LLM is only the sort of fuzzy pattern matching logic and creativity core; the harness provides verification feedback loops, the ability to interact with and explore the outside world, the ability to bring in programming language interpreters and so on in order to do more rigid symbolic logic, observability, systems for storing and recalling memory for continual learning, and so on, and I think a lot of these, especially feedback loops, resolve a lot of issues that LLMs seem to inherently face, such as hallucinations.
Moreover, LLMs are now substantially trained with writing code and using tools and interacting with the world and existing in harnesses in mind. At this point, I would have to guess that more than half of their training is actually devoted to rewarding them for correctly using all of these symbolic tools and solving problems in a simulated world than just predicting the next token.
I also think that LLMs, as a sort of core engine of an agentic harness, are allowing computers to do things we'd never really dreamed they could do before, that symbolic systems by themselves never really achieved, and as I said before, if you're looking for neurosymbolic AI — as Marcus says he is — then this is basically how it's going to have to look unless you want to fall down the expert system rabbit hole again.
It can be still be useful, even if it's wrong a lot, or if it takes a lot of scaffolding to mold it's answers into something correct. So just call it what it is: auto complete. But the problem is that if you do that the shine comes off and you can't justify trillions of dollars worth of sci fi fantasies. Who does that benefit? If the bubble pops all the boosters on here hoping for HAL 9000 are going to be out of a job and struggling, so they're working against their own interests by going along with this shell game.
Also, models will just write code for that sort of thing now.
Where does this definition come from? I certainly don't agree with it, and I am not sure who does, besides yourself and Marcus. Also it seems that you're saying AI does, in fact, mean 'perfect' AI, basically?
> A good auto complete is useful, a crappy AI is a marketing scam.
LLM agents do a lot more than auto complete now, and using them less like auto complete and more like 'AI' (via agents) has actually made them more useful and less crappy! Also, I don't think framing how modern RLVR'd LLMs operate now as as auto complete even makes a whole lot of sense in the first place.
This is one of those comments whose truth value depends entirely on a constantly shifting definition of “AI”.
The ability of modern models to functionally understand, answer questions, and make recommendations about software codebases is superhuman at this point, relative to most human software developers. What is that, if not artificial intelligence?
Perhaps you’re thinking of something more like AGI, but even there the terminology is loaded and ambiguous. The models are general enough to answer questions well on a vast range of subjects, and they exhibit understanding (again, functionally speaking this is true - whether someone wants to call them stochastic parrots is beside the point.) The appellation of “intelligence” applies just as well as in the coding case, it’s artificial, and it’s general.
> a worthwhile point to make.
I disagree. Without clear, justified definitions, it’s an incoherent, poorly specified point that seems to be driven by a desire to maintain a specific conclusion regardless of the evidence.
Ok but why report PR pieces as evidence for LLMs being useful?
These are tools that can possibly provide output that is eventually correct. It is the human behind the wheel doing the actual work.
Give the tool to a lesser expert and you will get more garbage with fewer lucky shots.
For the elite, it is a balancing act where more often than not, the cost of making LLM do the work is less than doing it yourself. If this percentage is above 90% of the time, the tool is useful.
https://www-cs-faculty.stanford.edu/~knuth/papers/claude-cyc...
https://openai.com/index/accelerating-science-gpt-5/?hl=en-G...
Small aside: I'm only bringing this up because last year I worked on a game where you had to solve various moral dilemmas in a 1v1 situation (think trolley experiment and one player says "flip the switch" and the other says "don't flip the switch")—the idea was to get an LLM to rate the arguments in a fun turn-based online game. I built it out, but I kind of gave up when I realize how absolutely awful the LLM was at actually rating arguments and their nuances. Who won legitimately felt more like rolling a dice than a verdict given by a real judge or a philosophy professor grading a paper. I put that project aside, but might do a Show HN at some point since the game is basically done.
Adjudication[1]—which is the real meat of this project—is done in a very partial way and I genuniely see basically zero value. Why not crawl reddit (or HN)? I know that also has issues, but it at least has more variety of tone.
[1] https://github.com/davegoldblatt/marcus-claims-dataset/blob/...
The essay numbers come from the Claude pipeline (2,218 claims), which used model judgment as of March 2, 2026 without a published URL evidence table. Different pipeline, different adjudication method.
Your core critique still lands, just at a different layer. The real weakness is LLM-as-judge circularity: an LLM scoring claims about LLMs. That's flagged in the methodology, and I don't have a clean answer beyond "the dataset is public, spot-check anything that looks wrong."
{"id": "claim_0081", "date": "2023-02-11", "claim": "Current Level 2 self-driving operates under easy conditions and is nowhere close to handling real-world complexity.", "type": "descriptive", "target": "Level 2 self-driving", "status": "supported", "horizon": null}
Why is this supported? How is this supported? Waymo would probably disagree, etc. Here's another one: {"id": "claim_0083", "date": "2023-02-11", "claim": "Tesla's product naming ('Autopilot', 'Full Self Driving') misleads customers into thinking the cars are more capable than they are, potentially causing accidents and deaths.", "type": "causal", "target": "Tesla marketing", "status": "supported", "horizon": null}
I fully agree that TSLA engages in all kinds of deceptive marketing, but to fully support the stunning claim that it potentially causes deaths is, uh, a bit much. I mean, at least tell me who's saying this. What's the provenance?If Claude itself rated the claims, which seems the be the case unless I'm totally off base, I fail to see how we're actually doing anything at all here. Right now I'm working on a local research agent, and I'm being absolutely meticulous about storing browsed webpages, snippets, etc. into short-term (session) LLM memory or a long-term (cross-session) SQLite db.
[1] https://github.com/davegoldblatt/marcus-claims-dataset/blob/...
I think the overall percentage is the wrong approach here.
It’s easy to say a lot of things that are factually true or predictions that are inevitably true.
However the more salient point with Gary Marcus is the one unforgivable thing he was wrong about and continues to double down on which is that deep learning is hitting a wall.
Starting in early 2022 and going through today, there is still so much low hanging fruit with deep learning.
Today’s LLM progress is mostly being made in RL. But world models are also still so early and they’re deep learning all the way down.
It would be nice if he would just admit he was wrong.
I wrote up the full pattern here: https://davesquickhits.substack.com/p/the-most-expensive-kin...
But then, to your point, what does it matter, if they're still as useful as they are? Even at this stage, Claude Code makes Jira halfway bearable.
Of course, we have to consider the devil's advocate as well. Most CEOs don't seem to be reporting great ROI on their "AI" investments.
Someone is so thin-skinned about a single guy writing a skeptical Substack that they spent their weekend building a dual-pipeline automation tool, scraping four years of his writing, instead of just building a product that actually disproves him. I’m not saying I agree with everything the man says, but until a human actually verifies these verdicts, this is just burnt tokens.
I wouldn't immediately even agree with an assessment made by these LLMs if I were Gary Marcus as that could immediately contradict any of the claims he even made and falling into the trustworthy trap. I'd remain skeptical as ever...
...because this is the worst of the red flags that ultimately supports Gary's argument that the LLM results may be untrustworthy:
All verdicts are LLM-scored, not human-verified.
People should check for themselves and draw their own conclusions.
> The crash hasn't come.
yet.
But then on the other hand, he completely ignores all of the developments in the field scaffolding around these systems in order to resolve these problems. All of the changes and developments in how these models are trained, all of the things they've actually been able to achieve and do, and basically all of the positive use cases and things that balance out his criticisms.
Since he doesn't really talk about any of that, of course he doesn't make false claims about it, he just ignores it, implicitly creating a false picture.
And then it is this false picture that he uses to justify his grandiose claims about how everyone should have listened to him about how to do AI and these systems are inevitably going to turn out to be useless and the whole industry is going to collapse and fully disappear and society is going to be ruined and so on.
So, of course, it looks like, on the one hand, all of his specific claims about AI are perfectly correct, and on the other hand, that all of his grinder claims about what that implies or means about the industry you have turned out to be wrong, and that he spends much more time on the latter than the former.
I think it is really crucial to emphasize that even though most of the individual claims he makes are correct, he spends much more time on the prognostications that are fundamentally not correct, or at least are very speculative right now. I think that's an indication of something gone very wrong with someone's epistemic and incentive situation.
https://github.com/davegoldblatt/marcus-claims-dataset/blob/...
Many of the "supported" claims here are vague, banal, obvious, or just opinion. E.g.
"the general public hasn't quite realized what's not possible yet"
"loads of things scale, but not at all"
"To be sentient is to be aware of yourself in the world; LaMDA simply isn't."
"To date, nobody, ever, has given a convincing and thorough account of how human children (and human children alone) learn language."
"A cat holding a remote control shouldn't have a human hand."
"What I didn't see last night was vision" (about Tesla Optimus)
barbarr•1h ago
dakolli•1h ago
I don't know if this will cause a ton of capital destruction, I doubt it, it will probably destroy a bunch of the slot machine/gambling addicts who are paying 5k a month on their credit cards thinking an autocomplete API is going to provide a profitable business.
A large part of this is a scam, just like many aspects of crypto were scams while others were not, this hype is very similar to NFT/Crypto hype from 2018-2023. Yes, some things were born out of those industries that are genuinely useful, a lot were not, its the same with AI.
Potential AI winter, I think there will be a "winter" just like crypto, but even during crypto's winter, some companies continued to operate and innovate but 90% disappeared. I believe the same thing will happen, and soon. Watch what happens to companies like Perplexity over the next 12-16 months lol.
oh_my_goodness•1h ago
barbarr•1h ago