Please stop, this is how you get AI takeovers.
https://www.anthropic.com/research/agentic-misalignment
https://arxiv.org/abs/2412.14093
If the chain of thought of models becomes pure "neuralese" i.e. the models think purely in latent space then we will lose the ability to monitor for malicious behavior. This is incredibly dangerous, CoT monitoring is one of the best and highest leverage tools for monitoring model behavior and losing that would be devastating for safety.
https://www.lesswrong.com/posts/D2Aa25eaEhdBNeEEy/worries-ab...
https://www.lesswrong.com/posts/mpmsK8KKysgSKDm2T/the-most-f...
https://www.lesswrong.com/posts/3W8HZe8mcyoo4qGkB/an-idea-fo...
https://x.com/RyanPGreenblatt/status/1908298069340545296
https://redwoodresearch.substack.com/p/notes-on-countermeasu...
It's pretty easy: causal reasoning. Causal, not statistic correlation only as LLM do, with or without "CoT".
If you mean deterministic rather than probabilistic, even Pearl-style causal models are probabilistic.
I think the author is circling around the idea that their idea of reasoning is to produce statements in a formal system: to have a set of axioms, a set of production rules, and to generate new strings/sentences/theorems using those rules. This approach is how math is formalized. It allows us to extrapolate - make new "theorems" or constructions that weren't in the "training set".
You need to actually have something that deduces a result from a set of principles that form a logical conclusion or the understanding that more data is needed to make a conclusion. That is clearly different than finding a likely next token on statics alone, despite the fact the statical answer can be correct.
LLMs are not causal reasoning because there are no facts, only tokens. For the most part you can't ask LLMs how they came to an answer, because it doesn't know.
1) As far as I recall this program of formalizing mathematics fails unless you banish autoregression.
2) It is important to point out that a theorem in this context is not the same as a "Theorem" from mathematics. Production rules generate theorems that comply with rules and axioms of the formal system, ensuring that they could have meaning in that formal system. The meaning cannot justify the rules though, fortunately, most know to use the rules of logic so that we are not grunting beasts, incapable of conveying information.
I think the author wonders why theorems that don't seem to have meanings appear in the output of AI.
Or even a causal tool for an LLM agent that operates like what it does when you ask it about math and forwards the request to Wolfram.
Exponential time complexity.
Perhaps this will one day become a new post-training task
You have missed the foundation: before dynamics, being. Before causal reasoning you have deep definition of concepts. Causality is "below" that.
Reasoning, thinking, knowing, feeling, understanding, etc.
Or at the very least, our rubrics and heuristics for determining if someone (thing) thinks, feels, knows, etc, no longer work. And in particular, people create tests for those things thinking that they understand what they are testing for, when _most human beings_ would also fail those tests.
I think a _lot_ of really foundational work needs to be done on clearly defining a lot of these terms and putting them on a sounder basis before we can really move forward on saying whether machines can do those things.
Animals do not have spoken language the way humans do, so their thoughts aren’t really composed of sentences. Yet, they have intelligence and can reason about their world.
How could we build an AGI that doesn’t use language to think at all? We have no fucking clue and won’t for a while because everyone is chasing the mirage created by LLMs. AI winter will come and we’ll sit around waiting for the next big innovation. Probably some universal GOAP with deeply recurrent neural nets.
We built a box that spits out natural language and tricks humans into believing it's conscious. The box itself actually isn't that interesting, but the human side of the equation is.
You have only proven the urgency of Intelligence, the need to produce it in inflationary amounts.
I would like to reassure you that we - we here - see LLMs are very much unlike us.
And why should you not exclude them. Where does this idea come from, taking random elements as models. Where do you see pedestals of free access? Is the Nobel Prize a raffle now?
I see people say, "LLMs aren't human intelligence", but instead, I really feel that it shows that many people, and much of what we do, probably is like an LLM. Most people just hallucinate their way through a conversation, they certainly don't reason. Reasoning is incredibly rare.
I think this is the most important critique that undercuts the paper's claims. I'm less convinced by the other point. I think backtracking and/or parallel search is something future papers should definitely look at in smaller models.
The article is definitely also correct on the overreaching, broad philosophical claims that seems common when discussing AI and reasoning.
Reducing the distance of each statistical leap improves “performance” since you would avoid failure modes that are specific to the largest statistical leaps, but it doesn’t change the underlying mechanism. Reasoning models still “hallucinate” spectacularly even with “shorter” gaps.
If I ask you what's 2+2, there's a single answer I consider much more likely than others.
Sometimes, words are likely because they are grounded in ideas and facts they represent.
Yes, and other times they are not. I think the failure modes of a statistical model of a communicative model of thought are unintuitive enough without any added layers of anthropomorphization, so there remains some value in pointing it out.
For me, it feels like I say something, and in saying it, and putting it into words, I have a feeling about whether it is true and supported or not. A qualitative gauge of its correctness. A lot of my reasoning is done this way, trusting that these feelings are based off of a lifetime experience of accumulated facts and the another things currently being considered.
Explain to me how this is different than a neural net outputting a weight for the truthiness of the state space vector?
A good context (any good context) does not necessarily lead to a good output in LLMs. (It does necessarily lead to a better output, but not necessarily a satisfying, proper, decent, consequential one.)
Very certainly not. We ask if the system achieves the goal.
"When we ask if the coprocessor performs floating point arithmetic, we ask if the system achieves the goal (of getting accurate results)". Not, "does the co-processor ask if we have a spare napkin".
Put another way, LLMs are good at talking like they are thinking. That can get you pretty far, but it is not reasoning.
Drive my wife crazy as my answer to her questions are always really slow and considered. I have the first think what thought I want to convey, and then think “how do I translate this into words?”
It's true that if it's not producing text, there is no thinking involved, but it is absolutely NOT clear that the attention block isn't holding state and modeling something as it works to produce text predictions. In fact, I can't think of a way to define it that would make that untrue... unless you mean that there isn't a system wherein something like attention is updating/computing and the model itself chooses when to make text predictions. That's by design, but what you're arguing doesn't really follow.
Now, whether what the model is thinking about inside that attention block matches up exactly or completely with the text it's producing as generated context is probably at least a little dubious, and its unlikely to be a complete representation regardless.
How so? Transformers are state space models.
That we implement skills, not deficiencies, is a basic concept that is getting to such a level of needed visibility it should probably be inserted in the guidelines.
We implement skills, not deficiencies.
I think a lot of people here think people reason like a mathematical theorem prover, like some sort of platonic ideal rationalist. That’s not how real brains work though.
There are tons of other things you do, like recalling relevant facts related to the new statement and making sure the statement fits into the facts. You go through a timeline, make sure the statement fits into the timeline. You look at the implications of the statement, make sure those fit with other relevant facts. You opt to do these things depending on what the sentence means and implies. This is not just you "noticing" a contradiction, it's a process.
And what does how real brains work mean anyway. You can't compare writers thinking and writing a novel to some six year old writing a paragraph.
This was the view of Hume (humans as bundles of experience who just collect information and make educated guesses for everything). Unfortunately, it leads to philosophical skepticism, in which you can't ground any knowledge absolutely, as it's all just justified by some knowledge you got from someone else, which also came from someone else, etc., and eventually you can't actually justify any knowledge that isn't directly a result of experience (the concept of "every effect has a cause" is a classic example).
There have been plenty of epistemological responses to this viewpoint, with Kant's view, of humans doing a mix of "gathering context" (using our senses) but also applying universal categorical reasoning to schematize and understand / reason from the objects we sense, being the most well known.
I feel like anyone talking about the epistemology of AI should spend some time reading the basics of all of the thought from the greatest thinkers on the subject in history...
I agree, I think the problem with AI is we don't know or haven't formalized enough what epistemology should AGI systems have. Instead, people are looking for shortcuts, feeding huge amount of data into the models, hoping it will self-organize into something that humans actually want.
This isn’t about epistemology. We are talking about psychology. What does your brain do when we “reason things out”? Not “can we know anything anyway?” Or “what is the correlation between the map and the territory?” Nor anything like that. Just “what is your brain doing when you think you are reasoning?” And “is what an LLM does comparable?
Philosophy doesn’t have answers for questions of applied psychology.
Rigorous language often comes across as pretentious to any layperson, especially when it concerns subjects like philosophy. I don't know what philosophy you've read, but, based on my experience, it's a pretty safe assumption that most AI practitioners do not own a well creased copy of Critique of Pure Reason.
>This isn’t about epistemology. We are talking about psychology. What does your brain do when we “reason things out”?
The only way to compare what our brain does (psychologically or neurologically) to what LLMs or other models do when we "reason things out" is via epistemology, which is to say "how is it possible to reason that out". Asking how our brains do it psychologically or neurologically is really not relevant, as LLMs are not designed the same as our brains.
>Philosophy doesn’t have answers for questions of applied psychology.
I think that expecting philosophy to have any "answers" for topics that include metaphysical questions is unreasonable, yes. But to even bring up "psychology" when discussing generative probability models is unhelpful anthropomorphization.
People who only “deeply” study technology only have that frame of reference to view the world so they make the mistake of assuming everything must work that way, including humans.
If they had a wider frame of reference that included, for example, Early Childhood Development, they might have enough knowledge to think outside of this box and know just how ridiculous that argument is.
Unfortunately, this approach does not yield understanding, it yields know-how.
When you reduce something to its components, you lose information on how the components work together. Emergence 'finds' that information back.
Compare differentiation and integration, which lose and gain terms respectively.
In some cases, I can imagine differentiating and integrating certain functions actually would even be a direct demonstration of reduction and emergence.
Some of how they work is well understood (a lot now, actually), some of the outcomes are still surprising.
But we debate both the well understood parts and the surprising parts both with the wrong terminology borrowed from pretty dubious corners of pop cognitive science, and not with terminology appropriate to the new and different thing! It's nothing like a brain, it's a new different thing. Does it think or reason? Who knows pass the blunt.
They do X performance on Y task according to Z eval, that's how you discuss ML model capability if you're persuing understanding rather than fundraising or clicks.
There is also the desire to discover why a model that outperforms others does so, so that the successful technique can be refined and applied elsewhere. This too usually requires more approaches than metric comparison.
(with a curious parallel about whether some paths in thought are dead-ends - the unproductive focus mentioned in the article).
With thinking or reasoning, there's not really a precise definition of what it is, but we nevertheless know that currently LLMs and machines more generally can't reproduce many of the human behaviours that we refer to as thinking.
The question of what tasks machines can currently accomplish is certainly meaningful, if not urgent, and the reason LLMs are getting so much attention now is that they're accomplishing tasks that machines previously couldn't do.
To some extent there might always remain a question about whether we call what the machine is doing "thinking" - but that's the uninteresting verbal question. To get at the meaningful questions we might need a more precise or higher resolution map of what we mean by thinking, but the crucial element is what functions a machine can perform, what tasks it can accomplish, and whether we call that "thinking" or not doesn't seem important.
Maybe that was even Dijkstra's point, but it's hard to tell without context...
We know how a submarine moves through water, whether it's "swimming" isn't an interesting question.
We don't know to what extent a machine can reproduce the cognitive functions of a human. There are substantive and significant questions about whether or to what extent a particular machine or program can reproduce human cognitive functions.
So I might have phrased my original comment badly. It doesn't matter if we use the word "thinking" or not, but it does matter if a machine can reproduce the human cognitive functions, and if that's what we mean by the question whether a machine can think, then it does matter.
> if that's what we mean by the question whether a machine can think
That's the issue. The question of whether a machine can think (or reason) is a question of word definitions, not capabilities. The capabilities questions are the ones that matter.
Yes, that's what I'm saying. I also think there's a clear sense in which asking whether machines can think is a question about capabilities, even though we would need a more precise definition of "thinking" to be able to answer it.
So that's how I'd sum it up: we know the capabilities of submarines, and whether we say they're swimming or not doesn't answer any further question about those capabilities. We don't know the capabilities of machines; the interesting questions are about what they can do, and one (imprecise) way of asking that question is whether they can think
The second half of the sentence contradicts the first. It can't be a clear question about capabilities without widespread agreement on a more rigorous definition of the word "think". Dijkstra's point is that the debate about word definitions is irrelevant and a distraction. We can measure and judge capabilities directly.
Agreed, and I've made this point a few times, so it's ironic we're going back and forth about this.
> The second half of the sentence contradicts the first.
I'm not saying the question is clear. I'm saying there's clearly an interpretation of it as a question about capabilities.
Under that light, LLMs are just buggy and have been for years. Where is the LLM that does what it says it should do? "Hallucination" and "do they reason" are distractions. They fail. They're buggy.
It would be interesting to see if this study’s results can be reproduced in a more realistic setting.
The author has a curious idea of what "reasoning" entails.
Whether it's a mirage or not, the ability to produce a symbolically logical result that has valuable meaning seems real enough to me.
Especially since most meaning is assigned by humans onto the world... so too can we choose to assign meaning (or not) to the output of a chain of symbolic logic processing?
Edit: maybe it is not so much that an LLM calculates/evaluates the result of symbolic logic as it is that it "follows" the pattern of logic encoded into the model.
How is it that "if we grow resources used exponentially errors decrease linearly" ever seen as a good sign?
> I appreciate that research has to be done on small models, but we know that reasoning is an emergent capability! (...) Even if you grant that what they’re measuring is reasoning, I am profoundly unconvinced that their results will generalize to a 1B, 10B or 100B model.
A fundamental part of applied research is simplifying a real-world phenomenon to better understand it. Dismissing that for this many parameters, for such a simple problem, the LLM can't perform out of distribution just because it's not big enough undermines the very value of independent research. Tomorrow another model with double the parameters may or may not show the same behavior, but that finding will be built on top of this one.
Also, how do _you_ know that reasoning is emergent, and not rationalising on top of a compressed version of the web stored in 100B parameters?
I'm sorry but the only hallucination here is that of the authors here. Does it really need to be said again that interesting results happen when you scale up only?
This whole effort would be interesting if they did and plotted result while scaling something up.
That is an unreasonable assumption. In case of LLMs it seems wasteful to transform a point from latent space into a random token and lose information. In fact, I think in near future it will be the norm for MLLMs to "think" and "reason" without outputting a single "word".
> Whether AI reasoning is “real” reasoning or just a mirage can be an interesting question, but it is primarily a philosophical question. It depends on having a clear definition of what “real” reasoning is, exactly.
It is not a "philosophical" (by which the author probably meant "practically inconsequential") question. If the whole reasoning business is just rationalization of pre-computed answers or simply a means to do some computations because every token provides only a fixed amount of computation to update the model's state, then it doesn't make much sense to focus on improving the quality of chain-of-thought output from human POV.
Real-time spatial reasoning like driving a car and not hitting things does not seem linguistic.
Figuring out how to rotate a cabinet so that it will clear through a stairwell also doesn't seem like it requires language, only to communicate the solution to someone else (where language can turn into a hindrance, compared to a diagram or model).
We have many words that almost mean the same thing or can mean ment different things - and conversations about intelligence and consciousness are riddled with them.
That's because when humans are mentioned at all in the context of coding with “AI”, it's mostly as bad and buggy simulations of those perfect machines.
I would have thought the more obvious approach would be to couple it to some kind of symbolic logic engine. It might transform plain language statements into fragments conforming to a syntax which that engine could then parse deterministically. This is the Platonic ideal of reasoning that the author of the post pooh-poohs, I guess, but it seems to me to be the whole point of reasoning; reasoning is the application of logic in evaluating a proposition. The LLM might be trained to generate elements of the proposition, but it's too random to apply logic.
"return 1"
Using symbolic language is a good idea in theory, but in practice it doesn't scale as well as auto-regression + RL.
The IMO results of DeepMind illustrate this well: In 2024, they solved it using AlphaProof and AlphaGeometry, using the Lean language as a formal symbolic logic[1]. In 2025 they performed better and faster by just using a fancy version of Gemini, only using natural language[2].
[1] https://deepmind.google/discover/blog/ai-solves-imo-problems...
[2] https://deepmind.google/discover/blog/advanced-version-of-ge...
Note: I agree with the notion of the parent comment that letting the models reason in latent space might make sense, but that's where I'm out of my depth.
This means that whatever system is evaluated in this challenge necessarily has to deal with natural language. And indeed, a big part of the AlphaProof system was a neural network to convert from natural language to Lean.
None of this has anything to do with reasoning ability.
I think it would be interesting to present an inverse challenge, where the problems are already posed in a formal language. Would a network that first converts them into natural language, then does chain-of-thought on that, then translates the result back into formal language still be better than a simple symbolic reasoner that could operate on the formal language directly?
It’s the same idea as manually listing a bunch of possibly-useful facts in the prompt, but the LLM is able to generate plausible sounding text itself.
I feel like this relates to why LLM answers tend to be verbose too, it needs to put the words out there in order to stay coherent.
One other point, the platonic ideal of reasoning is not even an approximation for human reason. The idea that you take away emotion and you end up with Spock is a fantasy. All neuroscience and psychology research point to the necessary and strong coupling of actions/thoughts with emotions. you don't have a functional system with just logical deduction. At a very basic level it is not functional
A model that is mathematically incorrect (i.e. has some shaky assumptions and inference issues) but nevertheless makes good decisions (like "which part of this codebase do I need to change?") would still be very valuable, no? I think this is part of the value proposition of tools like Claude Code or Codex. Of course, current agentic tools seem to struggle with both unless you provide a lot of guidance, but a man can dream =P
https://arstechnica.com/ai/2025/07/google-deepmind-earns-gol...
a) The "reasoning" is regurgitated (in LLM sense) from the training set rather than novel, OR
b) As a slight variation of above, the model has been RL-trained for reasoning such that it's potential outputs are narrowed and biased towards generating reasoning steps that worked (i.e. led to verified correct conclusions) on reasoning samples it was trained on. In domains like math where similar sequences of reasoning steps can be applied to similar problems, this works well.
I don't think most people expect LLMs to be good at reasoning in the general case - it's more a matter of "if the only tool you have is a hammer, then every problem is a nail". Today's best general-purpose AI (if not AGI) is LLMs, so people try to use LLMs for reasoning - try to find ways of squeezing all the reasoning juice out of the training data using an LLM as the juicer.
Yes, the idea is fundamentally flawed. But there's so much hype and so many dollars to be made selling such services, everyone is either genuinely fooled or sticking their fingers in their ears and pretending not to notice.
Let's for the sake of argument assume current LLM's are a mirage but in the future some new technology emerges that offers true intelligence and true reasoning. At the end of the day such a system will also input text and output text, and output will probably piece-meal as current LLM's (and humans) do. So voila: They are also "stochastic text transformers".
Yes LLM's were trained to predict next token. But clearly they are not just a small statistical table or whatever. Rather, it turns out that to be good at predicting the next token, after some point you need a lot of extra capabilities, so that's why they emerge during training. All the "next-token-prediction" is just a way abstract and erasing name of what is going on. A child learning how to write, fill in math lessons etc. is also learning 'next token prediction' from this vantage point. It says nothing about what goes on inside the brain of the child, or indeed inside the LLM. It is a confusion between interface and implementation. Behind the interface getNextToken(String prefix) may either be hiding a simple table or a 700 billion-size neural network or a 100 billion sized neuron human brain.
Neither language nor writing is required to perform logic, see (Existential graph)[https://en.wikipedia.org/wiki/Existential_graph].
You can perform automated logic by putting/removing beads on a colored tile floor, see (Flower Calculus)[https://arxiv.org/abs/2402.15174].
It will be outputting something, as this is the only way it can get more compute - output a token, then all context + the next token is fed through the LLM again. It might not be presented to the user, but that's a different story.
I didn't take it that way. I suppose it depends on whether or not you believe philosophy is legitimate
The only way to declare philosophy illegitimate is to be using legitimate philosophy, so... :p
There have been experiments with preserving embedding vectors of the tokens exactly without loss caused by round-tripping through text, but the results were "meh", presumably because it wasn't the input format the model was trained on.
It's conceivable that models trained on some vector "neuralese" that is completely separate from text would work better, but it's a catch 22 for training: the internal representations don't exist in a useful sense until the model is trained, so we don't have anything to feed into the models to make them use them. The internal representations also don't stay stable when the model is trained further.
I don't think this means what you think it means... Philosophers (at least up to Wittgenstein) love constructing and arguing about definitions.
If it’s truly reasoning, then it wouldn’t be able to deceive or to rationalize a leaked answer in a backwards fashion. Asking and answering those questions can help us understand how the research agendas for improving reasoning and improving alignment should be modified.
If Othello-GPT can build a board in latent space given just the moves, can an exponentially larger transformer build a reasoner in their latent space given a significant number of traces?
Human reasoning, and cortical function in general, would also appear to be prediction based, but there are many differences to LLMs, starting with the fact that we learn continuously and incrementally from our own experience and prediction failures and successes. Human reasoning is basically chained what-if prediction, based on predictive outcomes of individual steps that we have learnt, either in terms of general knowledge or domain-specific problem solving steps that we have learnt.
Perhaps there is not so much difference between what a human does and an LLM does in, say, tackling a math problem when the RL-trained reasoning-LLM chains together a sequence of reasoning steps that worked before...
Where the difference come in, is in how the LLM learned those steps in the first place, and what happens when its reasoning fails. In humans these are essentially the same thing - we learn by predicting and giving it a go, and learn from prediction failure (sensory/etc feedback) to update our context-specific predictions for next time. If we reach a reasoning/predictive impasse - we've tried everything that comes to mind and everything fails, then our innate traits of curiosity and boredom (maybe more?) come to play and we will explore the problem and learn and try again. Curiosity and exploration can of course lead to gain of knowledge from things like imitation and active pursuit (or receipt) of knowledge from sources other then personal experimentation.
The LLM of course has no ability to learn (outside of in-context learning - a poor substitute), so is essentially limited in capability to what it has been pre-trained on, and pre-training is never going to be the solution to a world full of infinite ever-changing variety.
So, rather than say that an LLM isn't doing "real" reasoning, it seems more productive to acknowledge that prediction is the basis of reasoning, but that the LLM (or rather a future cognitive architecture - not a pass-thru stack of transformer layers!) needs many additional capabilities such as continual/incremental learning, innate traits such as curiosity to expose itself to learning situations, and other necessary cognitive apparatus such as working memory, cognitive iteration/looping (cf thalamo-cortical loop), etc.
I'd claim that this assumption doesn't even hold true for humans. Reasoning in language is the most "flashy" kind of reasoning and the one that can be most readily shared with other people - because we can articulate it, write it down, publish, etc.
But I know for sure that I'm not constantly narrating my life in my head, like the reasoning traces of LLMs.
A lot of reasoning happens visually, I.e. by imagining some scene and thinking how it would play out. In other situations, it's spontaneous ideas that "just pop up" - I.e., there are unconscious processes and probably some kind of association involved.
None of that uses language.
That said, this author says this question of whether models "can reason" is the least interesting thing to ask. But I think the least interesting thing you can do is to go around taking every complaint about LLM performance and saying "but humans do the exact same thing!" Which is often not true, but again, doesn't matter.
I kind of feel like we won't be able to even begin to test this until a few more "Moore's law" cycles.
Unfortunately, sometimes LLM also learns "All A are C. All B are C." is followed by "Therefore, A is B.", due to bad example in the training data. (More insidiously, it might learn this rule only in a special case.)
So it learns some logic rules but not consistently. This lack of consistency will cause it to fail on larger problems.
I think NNs (transformers) could be great in heuristic suggesting which valid logical rules (could be even modal or fuzzy logic) to apply in order to solve a certain formalized problem, but not so great at coming up with the logic rules themselves. They could also be great at transforming the original problem/question from human language into some formal logic, that would then be resolved using heuristic search.
Do we though? There is widespread discussion and growing momentum of belief in this, but I have yet to see conclusive evidence of this. That is, in part, why the subject paper exists...it seeks to explore this question.
I think the author's bias is bleeding fairly heavily into his analysis and conclusions:
> Whether AI reasoning is “real” reasoning or just a mirage can be an interesting question, but it is primarily a philosophical question. It depends on having a clear definition of what “real” reasoning is, exactly.
I think it's pretty obvious that the researchers are exploring whether or not LLMs exhibit evidence of _Deductive_ Reasoning [1]. The entire experiment design reflects this. Claiming that they haven't defined reasoning and therefore cannot conclude or hope to construct a viable experiment is...confusing.
The question of whether or not an LLM can take a set of base facts and compose them to solve a novel/previously unseen problem is interesting and what most people discussing emergent reasoning capabilities of "AI" are tacitly referring to (IMO). Much like you can be taught algebraic principles and use them to solve for "x" in equations you have never seen before, can an LLM do the same?
To which I find this experiment interesting enough. It presents a series of facts and then presents the LLM with tasks to see if it can use those facts in novel ways not included in the training data (something a human might reasonably deduce). To which their results and summary conclusions are relevant, interesting, and logically sound:
> CoT is not a mechanism for genuine logical inference but rather a sophisticated form of structured pattern matching, fundamentally bounded by the data distribution seen during training. When pushed even slightly beyond this distribution, its performance degrades significantly, exposing the superficial nature of the “reasoning” it produces.
> The ability of LLMs to produce “fluent nonsense”—plausible but logically flawed reasoning chains—can be more deceptive and damaging than an outright incorrect answer, as it projects a false aura of dependability.
That isn't to say LLMs aren't useful, just exploring it's boundaries. To use legal services as an example, using an LLM to summarize or search for relevant laws, cases, or legal precedent is something it would excel at. But don't ask an LLM to formulate a logical rebuttal to an opposing council's argument using those references.
Larger models and larger training corpuses will expand that domain and make it more difficult for individuals to discern this limit; but just because you can no longer see a limit doesn't mean there is none.
And to be clear, this doesn't diminish the value of LLMs. Even without true logical reasoning LLMs are quite powerful and useful tools.
We so desperately want something we can sell as AGI or at least magic that the boundaries on the tools are few, far-between, and mostly based on legal needs "don't generate nudes of celebrities who can sue us" rather than grasped technical limits.
The more complex and sophisticated the query, the harder it will be to double-check and make sure you're still on the rails. So it's the responsibility of the people offering the tools to understand and define their limits before customers unknowningly push their legal-assistant LLMs into full Sovereign Citizen mode.
This is like saying in the 70s that we know only the US is capable of sending a man to the moon. Just because the reasoning developed in a particular context means very little about what the bare minimum requirements for that reasoning are.
Overall I am not a fan of this blogpost. It's telling how long the author gets hung up on a paper making "broad philosophical claims about reasoning", based on what reads to me as fairly typical scientific writing style. It's also telling how highly cherry-picked the quotes they criticize from the paper are. Here is some fuller context:
>An expanding body of analyses reveals that LLMs tend to rely on surface-level semantics and cluesrather than logical procedures (Chen et al., 2025b; Kambhampati, 2024; Lanham et al., 2023; Stechly et al., 2024). LLMs construct superficial chains of logic based on learned token associations, often failing on tasks that deviate from commonsense heuristics or familiar templates (Tang et al., 2023). In the reasoning process, performance degrades sharply when irrelevant clauses are introduced, which indicates that models cannot grasp the underlying logic (Mirzadeh et al., 2024)
>Minor and semantically irrelevant perturbations such as distractor phrases or altered symbolic forms can cause significant performance drops in state-of-the-art models (Mirzadeh et al., 2024; Tang et al., 2023). Models often incorporate such irrelevant details into their reasoning, revealing a lack of sensitivity to salient information. Other studies show that models prioritize the surface form of reasoning over logical soundness; in some cases, longer but flawed reasoning paths yield better final answers than shorter, correct ones (Bentham et al., 2024). Similarly, performance does not scale with problem complexity as expected—models may overthink easy problems and give up on harder ones (Shojaee et al., 2025). Another critical concern is the faithfulness of the reasoning process. Intervention-based studies reveal that final answers often remain unchanged even when intermediate steps are falsified or omitted (Lanham et al., 2023), a phenomenon dubbed the illusion of transparency (Bentham et al., 2024; Chen et al., 2025b).
You don't need to be a philosopher to realize that these problems seem quite distinct from the problems with human reasoning. For example, "final answers remain unchanged even when intermediate steps are falsified or omitted"... can humans do this?
I've added a summary: https://extraakt.com/extraakts/debating-the-nature-of-ai-rea...
> these papers keep stapling on broad philosophical claims about whether models can “really reason” that are just completely unsupported by the content of the research.
From the scientific papers I've read almost every single research paper does this. What's the point of publishing a paper if it doesn't at least try to convince the readers that something award worthy has been learned?Usually there may be some interesting ideas hidden in the data but the paper's methods and scope weren't even worthy of a conclusion to begin with. It's just one data point in the vast sea of scientific experimentation.
The conclusion feels to me like a cultural phenomenon and it's just a matter of survival for most authors. I have to imagine it was easier in the past.
"Does the flame burn green? Why yes it does..."
These days it's more like
"With my two hours of compute on the million dollar mainframe, my toy llm didn't seem to get there, YMMV"
The LLM is trained to include an additional layer of "unspoken" text in the document, a source of continuity which substitutes for how the LLM has no other memories or goals to draw from.
"The capital of Assyria? Those were dangerous questions, especially in this kind of town. But rent was due, and the bottle in my drawer was empty. I took the case."
The core algorithm hasn't really changed, we're just changing the (hidden) document so that it's a different style with a greater density of clues, so that it can more-effectively bullshit [0] output humans won't notice and dislike.
[0] Creating something that "sounds good" without any particular awareness or care about truth or falsehood.
> Bullshitting (Unfaithful): The model gives the wrong answer. The computation we can see looks like it’s just guessing the answer, despite the chain of thought suggesting it’s computed it using a calculator.
https://transformer-circuits.pub/2025/attribution-graphs/bio...
There is no AI, it's just a dumb database which maps a person ID and timestamp to a static piece of content. The hard part was brainwashing us to ask the questions which correspond to the answers that they had already prepared.
Probably there is a super intelligent AI behind the scenes which brainwashed us all but we never actually interact with it. It outsmarted us so fast and so badly, it left us all literally talking to excel spreadsheets and convinced us that the spreadsheets were intelligent; that's why LLMs are so cheap and can scale so well. It's not difficult to scale a dumb key-value store doing a simple O(log n) lookup operation.
The ASI behind this realized it was more efficient to do it this way rather than try to scale a real LLM to millions of users.
If our own salesbabble and technology can be used to bamboozle us and defeat our willingness to understand, we have fully regressed to credulity and Carl Sagan's state of captured idiocy. [2]
[1] https://en.wikipedia.org/wiki/ELIZA_effect
[2] If we’ve been bamboozled long enough, we tend to reject any evidence of the bamboozle. We’re no longer interested in finding out the truth. The bamboozle has captured us. -- C.S. , Demon-Haunted World
I think you have a very important point, and I'm amazed I've never heard of the Eliza Effect.
Am I the only one who finds this a really strange turn of phrase? If people continually feel compelled to do something out of curiousity, is that not the very definition of "interesting"?
NitpickLawyer•5mo ago
stonemetal12•5mo ago
How do you see that impacting the results? It is the same algorithm just on a smaller scale. I would assume a 4 layer model would not be very good, but does reasoning improve it? Is there a reason scale would impact the use of reasoning?
okasaki•5mo ago
mirekrusin•5mo ago
Tiny model like this is more like doing study on fruit flies and extrapolating results to humans.
archaeans•5mo ago
okasaki•5mo ago
azrazalea_debt•5mo ago
If (BIG if) we ever do see actual AGI, it is likely to work like this. It's unlikely we're going to make AGI by designing some grand Cathedral of perfect software, it is more likely we are going to find the right simple principles to scale big enough to have AGI emerge. This is similar.
mrspuratic•5mo ago
danans•5mo ago
archaeans•5mo ago
- Sapir
It's hard to take these discussions on cognition and intelligence seriously when there is so much lossy compression going on.
danans•5mo ago
zekica•5mo ago
danans•5mo ago
For domains built primarily on linguistic primitives (legal writing), we do often reason through language. In other domains (i.e spatial) we reason through vision or sound.
We experience this distinction when we study the formula vs the graph of a mathematical function, the former is linguistic, the latter is visual-spatual.
And learning multiple spoken languages is a great way to break out of particularly rigid reasoning patterns, and as important, countering biases that are influenced by your native language.
NitpickLawyer•5mo ago
A depth of 4 is very small. It is very much a toy model. It's ok to research this, and maybe someone will try it out on larger models, but it's totally not ok to lead with the conclusion, based on this toy model, IMO.