https://en.m.wikipedia.org/wiki/Betteridge's_law_of_headline...
Please stop, this is how you get AI takeovers.
It's pretty easy: causal reasoning. Causal, not statistic correlation only as LLM do, with or without "CoT".
If you mean deterministic rather than probabilistic, even Pearl-style causal models are probabilistic.
I think the author is circling around the idea that their idea of reasoning is to produce statements in a formal system: to have a set of axioms, a set of production rules, and to generate new strings/sentences/theorems using those rules. This approach is how math is formalized. It allows us to extrapolate - make new "theorems" or constructions that weren't in the "training set".
You need to actually have something that deduces a result from a set of principles that form a logical conclusion or the understanding that more data is needed to make a conclusion. That is clearly different than finding a likely next token on statics alone, despite the fact the statical answer can be correct.
LLMs are not causal reasoning because there are no facts, only tokens. For the most part you can't ask LLMs how they came to an answer, because it doesn't know.
Or even a causal tool for an LLM agent that operates like what it does when you ask it about math and forwards the request to Wolfram.
Exponential time complexity.
You have missed the foundation: before dynamics, being. Before causal reasoning you have deep definition of concepts. Causality is "below" that.
Reasoning, thinking, knowing, feeling, understanding, etc.
Or at the very least, our rubrics and heuristics for determining if someone (thing) thinks, feels, knows, etc, no longer work. And in particular, people create tests for those things thinking that they understand what they are testing for, when _most human beings_ would also fail those tests.
I think a _lot_ of really foundational work needs to be done on clearly defining a lot of these terms and putting them on a sounder basis before we can really move forward on saying whether machines can do those things.
Animals do not have spoken language the way humans do, so their thoughts aren’t really composed of sentences. Yet, they have intelligence and can reason about their world.
How could we build an AGI that doesn’t use language to think at all? We have no fucking clue and won’t for a while because everyone is chasing the mirage created by LLMs. AI winter will come and we’ll sit around waiting for the next big innovation. Probably some universal GOAP with deeply recurrent neural nets.
We built a box that spits out natural language and tricks humans into believing it's conscious. The box itself actually isn't that interesting, but the human side of the equation is.
You have only proven the urgency of Intelligence, the need to produce it in inflationary amounts.
I would like to reassure you that we - we here - see LLMs are very much unlike us.
And why should you not exclude them. Where does this idea come from, taking random elements as models. Where do you see pedestals of free access? Is the Nobel Prize a raffle now?
I think this is the most important critique that undercuts the paper's claims. I'm less convinced by the other point. I think backtracking and/or parallel search is something future papers should definitely look at in smaller models.
The article is definitely also correct on the overreaching, broad philosophical claims that seems common when discussing AI and reasoning.
Reducing the distance of each statistical leap improves “performance” since you would avoid failure modes that are specific to the largest statistical leaps, but it doesn’t change the underlying mechanism. Reasoning models still “hallucinate” spectacularly even with “shorter” gaps.
If I ask you what's 2+2, there's a single answer I consider much more likely than others.
Sometimes, words are likely because they are grounded in ideas and facts they represent.
Put another way, LLMs are good at talking like they are thinking. That can get you pretty far, but it is not reasoning.
It's true that if it's not producing text, there is no thinking involved, but it is absolutely NOT clear that the attention block isn't holding state and modeling something as it works to produce text predictions. In fact, I can't think of a way to define it that would make that untrue... unless you mean that there isn't a system wherein something like attention is updating/computing and the model itself chooses when to make text predictions. That's by design, but what you're arguing doesn't really follow.
Now, whether what the model is thinking about inside that attention block matches up exactly or completely with the text it's producing as generated context is probably at least a little dubious, and its unlikely to be a complete representation regardless.
That we implement skills, not deficiencies, is a basic concept that is getting to such a level of needed visibility it should probably be inserted in the guidelines.
We implement skills, not deficiencies.
This was the view of Hume (humans as bundles of experience who just collect information and make educated guesses for everything). Unfortunately, it leads to philosophical skepticism, in which you can't ground any knowledge absolutely, as it's all just justified by some knowledge you got from someone else, which also came from someone else, etc., and eventually you can't actually justify any knowledge that isn't directly a result of experience (the concept of "every effect has a cause" is a classic example).
There have been plenty of epistemological responses to this viewpoint, with Kant's view, of humans doing a mix of "gathering context" (using our senses) but also applying universal categorical reasoning to schematize and understand / reason from the objects we sense, being the most well known.
I feel like anyone talking about the epistemology of AI should spend some time reading the basics of all of the thought from the greatest thinkers on the subject in history...
People who only “deeply” study technology only have that frame of reference to view the world so they make the mistake of assuming everything must work that way, including humans.
If they had a wider frame of reference that included, for example, Early Childhood Development, they might have enough knowledge to think outside of this box and know just how ridiculous that argument is.
Some of how they work is well understood (a lot now, actually), some of the outcomes are still surprising.
But we debate both the well understood parts and the surprising parts both with the wrong terminology borrowed from pretty dubious corners of pop cognitive science, and not with terminology appropriate to the new and different thing! It's nothing like a brain, it's a new different thing. Does it think or reason? Who knows pass the blunt.
They do X performance on Y task according to Z eval, that's how you discuss ML model capability if you're persuing understanding rather than fundraising or clicks.
(with a curious parallel about whether some paths in thought are dead-ends - the unproductive focus mentioned in the article).
With thinking or reasoning, there's not really a precise definition of what it is, but we nevertheless know that currently LLMs and machines more generally can't reproduce many of the human behaviours that we refer to as thinking.
The question of what tasks machines can currently accomplish is certainly meaningful, if not urgent, and the reason LLMs are getting so much attention now is that they're accomplishing tasks that machines previously couldn't do.
To some extent there might always remain a question about whether we call what the machine is doing "thinking" - but that's the uninteresting verbal question. To get at the meaningful questions we might need a more precise or higher resolution map of what we mean by thinking, but the crucial element is what functions a machine can perform, what tasks it can accomplish, and whether we call that "thinking" or not doesn't seem important.
Maybe that was even Dijkstra's point, but it's hard to tell without context...
It would be interesting to see if this study’s results can be reproduced in a more realistic setting.
The author has a curious idea of what "reasoning" entails.
Whether it's a mirage or not, the ability to produce a symbolically logical result that has valuable meaning seems real enough to me.
Especially since most meaning is assigned by humans onto the world... so too can we choose to assign meaning (or not) to the output of a chain of symbolic logic processing?
Edit: maybe it is not so much that an LLM calculates/evaluates the result of symbolic logic as it is that it "follows" the pattern of logic encoded into the model.
> I appreciate that research has to be done on small models, but we know that reasoning is an emergent capability! (...) Even if you grant that what they’re measuring is reasoning, I am profoundly unconvinced that their results will generalize to a 1B, 10B or 100B model.
A fundamental part of applied research is simplifying a real-world phenomenon to better understand it. Dismissing that for this many parameters, for such a simple problem, the LLM can't perform out of distribution just because it's not big enough undermines the very value of independent research. Tomorrow another model with double the parameters may or may not show the same behavior, but that finding will be built on top of this one.
Also, how do _you_ know that reasoning is emergent, and not rationalising on top of a compressed version of the web stored in 100B parameters?
That is an unreasonable assumption. In case of LLMs it seems wasteful to transform a point from latent space into a random token and lose information. In fact, I think in near future it will be the norm for MLLMs to "think" and "reason" without outputting a single "word".
> Whether AI reasoning is “real” reasoning or just a mirage can be an interesting question, but it is primarily a philosophical question. It depends on having a clear definition of what “real” reasoning is, exactly.
It is not a "philosophical" (by which the author probably meant "practically inconsequential") question. If the whole reasoning business is just rationalization of pre-computed answers or simply a means to do some computations because every token provides only a fixed amount of computation to update the model's state, then it doesn't make much sense to focus on improving the quality of chain-of-thought output from human POV.
That said, this author says this question of whether models "can reason" is the least interesting thing to ask. But I think the least interesting thing you can do is to go around taking every complaint about LLM performance and saying "but humans do the exact same thing!" Which is often not true, but again, doesn't matter.
NitpickLawyer•3h ago
stonemetal12•1h ago
How do you see that impacting the results? It is the same algorithm just on a smaller scale. I would assume a 4 layer model would not be very good, but does reasoning improve it? Is there a reason scale would impact the use of reasoning?
okasaki•1h ago
azrazalea_debt•1h ago
If (BIG if) we ever do see actual AGI, it is likely to work like this. It's unlikely we're going to make AGI by designing some grand Cathedral of perfect software, it is more likely we are going to find the right simple principles to scale big enough to have AGI emerge. This is similar.
NitpickLawyer•1h ago
A depth of 4 is very small. It is very much a toy model. It's ok to research this, and maybe someone will try it out on larger models, but it's totally not ok to lead with the conclusion, based on this toy model, IMO.