On the Existence, Impact, and Origin of Hallucination-Associated Neurons in LLMs

30•bilsbie•1mo ago

Comments

buppermint•1mo ago

From a quick read, this is cool but maybe a little overstated. From Figure 3, completely suppressing these neurons only reduces hallucinations by like ~5% compared to their normal state.

Table 1 is even more odd, H-neurons predicts hallucination ~75% of the time but a similar % of random neurons predict hallucinations ~60% of the time, which doesn't seem like a huge difference to me.

lukeinator42•1mo ago

It is fairly well established that neurons in these artificial neural networks are polysemantic and information is represented in directions in the activation embedding space rather than neurons independently representing information (which is why anthropic is doing things like training sparse autoencoders). I haven't read the paper in depth but it seems like it is based on a fundamental misunderstanding about neurons in ANNs vs the brain.

readthenotes1•1mo ago

It may be that there are just a few misfirings that dramatically degrade the results. Strokes in the human brain causing a few neurons to die can have outsized affects in the body...

GaggiX•1mo ago

The people talking about semantics in the comment section seems to completely ignore the positive correlation of LLMs between accuracy and stated confidence, this is called calibration and this "old" blog post from a year ago already showed it, LLMs can know what they know: https://openai.com/index/introducing-simpleqa/

jurystillout26•1mo ago

Many people seem to be claiming that "LLMs do what humans do / humans also hallucinate", as if the process of human knowledge is identical to the purely semantic knowledge of LLMs.

No. Human beings have experiential, embodied, temporal knowledge of the world through our senses. That is why we can, say, empirically know something, which is vastly different than semantically or logically knowing something. Yes, human beings also have probabalistic ways of understanding the world and interacting with others. We have many other forms of knowledge as well and the LLM way of interpreting data is by no means the primary way in which we feel confident that something is true or false.

That said, I don't get up in arms about the term "hallucination", although I prefer the term confabulation per neuroscientist Anil Seth. Many clunky metaphors are now mainstream, and as long as the engineers and researchers who study these kinds of things are ok with that, that's the most important thing.

But what I think all these people who dismiss objections to the term as "arguing semantics" are missing is the fundamental point: LLMs have no intent, and they have no way of distinguishing what data is empirically true or not. This is why the framing, not just the semantics, of this piece is flawed. "Hallucinations" is a feature of LLMs that exists at the very conceptual level, not as a design flaw of current models. They have pattern recognition, which gets us very far in terms of knowing things, but people who only rely on such methods of knowing are most often referred to as conspiracy theorists.

meowface•1mo ago

On one hand this is true. On the other hand it does seem possible, in theory, that through enough post-training and other measures they could become a bit closer to human minds and not just be token guessers.

The human brain may at its fundamental level operate on the principles of predictive processing (https://slatestarcodex.com/2017/09/05/book-review-surfing-un...). It might be that it has many layers surrounding that raw predictive core which develop us into epistemological beings. The LLMs we see today may be in the very early stages of a similar sort of (artificial) evolution.

The reflexiveness with which even top models like Opus 4.5 will sometimes seamlessly confabulate things definitely does make it seem like it is a very deep problem, but I don't think it's necessarily unsolvable. I used to be among the vast majority of people who thought LLMs were not sufficient to get us to AGI/ASI, but I'm increasingly starting to feel that piling enough hacks atop LLMs might really be what gets there before anything else.

dang•1mo ago

[stub for offtopicness]

[submitters: one reason for not editorializing titles is it makes the threads be about that!]

bowsamic•1mo ago

“Physical”? I don’t think that’s the right word to use

Miraltar•1mo ago

Agreed, the original title is better (and typo free) although a bit long.

H-Neurons: On the Existence, Impact, and Origin of Hallucination-Associated Neurons in LLMs

ZeroConcerns•1mo ago

Yeah, actual title is "H-Neurons: On the Existence, Impact, and Origin of Hallucination-Associated Neurons in LLMs"

But regardless of title this is all highly dubious...

airhangerf15•1mo ago

LLMs don't "hallucinate" or "lie." They have no intent. They're Weighted Random Word Generator Machines. They're train mathematically to create series of tokens. Whenever they get something "right," it's literally by accident. If you get that rate of accidental rightness up to 80%, and people suddenly thing the random word generator is some kind of oracle. It's not. It's a large model with an embedded space, tokens and a whole series of computationally expensive perceptron and attention blocks that generate output.

The title/introduction is very baited, because it implies some "physical" connection to hallucinations in biological organism, but it's focused on trying to single out certain parts of the model. LLMs are absolutely nothing at all like a biological system, of which our brains are orders of magnitudes more complex than the machines we've built that we no longer fully understand. Believing in these LLMs as being some next stage in understanding intelligence is hubris.

pektezol•1mo ago

It’s been more than 3 years and people still can’t understand this.

bigfishrunning•1mo ago

People don't want to understand this.

ohyes•1mo ago

People have a strong financial incentive to not understand this. It’s subprime mortgages.

user34283•1mo ago

Three top level comments so far, and as far as I can tell each is entirely pointless yapping about semantics around 'hallucination'.

Who cares? I wonder if any of the commenters is qualified enough to understand the research at all. I am not.

Miraltar•1mo ago

Yup, three years of fighting on "correct" use of words. Maybe someday people will accept that LLMs do "hallucinate" even tho it's not the same "hallucinate" as for humans.

amelius•1mo ago

You're reasoning at the wrong abstraction level. This will not get you very far.

crazygringo•1mo ago

> LLMs don't "hallucinate" or "lie." They have no intent.

You're just arguing about semantics. It doesn't matter in any substantial way. Ultimately, we need a word to distinguish factual output from confidently asserted erroneous output. We use the word "hallucinate". If we used a different word, it wouldn't make any difference -- the observable difference remains the same. "Hallucinate" is the word that has emerged, and it is now by overwhelming consensus the correct word.

> Whenever they get something "right," it's literally by accident.

This is obviously false. A great deal of training goes into making sure they usually get things right. If an infinite number of monkeys on typewriters get something right, that's by accident. Not LLM's.

tsumnia•1mo ago

> You're just arguing about semantics. It doesn't matter in any substantial way.

While I agree for many general aspects of LLMs, I do disagree in terms of some of the meta-terms used when describing LLM behavior. For example, the idea that AI has "bias" is problematic because neural networks literally have a variable called "bias", thus of course AI will always have "bias". Plus, a biases AI is literally the purpose behind classification algorithms.

But these terms, "bias" and "hallucinations", are co-opted to spin a narrative of no longer trusting AI.

How in the world did creating an overly confident chatbot completely 180 years of AI progress and sentiment?

spwa4•1mo ago

Terminology sucks. There is an ML technique called "hallucinating", that can really improve results. It works, for example, on Alphafold, and allows you to reverse the function of Alphafold (instead of finding the fold that matches a given protein or protein complex, find a protein complex that has a specific shape, or fits on a specific shape).

It's called hallucination because it works by imagining you have the solution and then learning what the input needs to be to get that solution. Treat the input or the output as weights and learn an input that fits an output or vice-versa instead of the network. Fix what the network sees as the "real world" to match what "what you already knew", just like a hallucinating human does.

You can imagine how hard it is to find papers on this technique nowadays.

TomasBM•1mo ago

While I agree that we need a word for this type of behavior, hallucinate is a wrong choice IMO.

Hallucinations are already associated with a type of behavior, which is (roughly defined) "subjectively seeing/hearing things which aren't there". This is an input-level error, not the right umbrella term for the majority of errors happening with LLMs, many if which are at output-level.

I don't know what would be a better term, but we should distinguish between different semantic errors, such as:

- confabulating, i.e., recalling distorted or misinterpreted memories;

- lying, i.e., intentionally misrepresenting an event or memory;

- bullshitting, i.e., presenting a version without regard for the truth or provenance; etc.

I'm sure someone already made a better taxonomy, and hallucination is OK for normal public discussions, but I'm not sure why the distinctions aren't made in supposedly more serious works.

crazygringo•1mo ago

I mean, I think you're right that confabulation is probably a more correct technical term, but we all use hallucinate now, so it doesn't really matter. It might have been useful to argue about it 4 or 5 years ago, but that ship has long since sailed. [1]

And I think we already distinguish between types of errors -- LLM's effectively don't lie, AFAIK, unless you're asking them to engage in role-play or something. They mostly either hallucinate/confabulate in terms of inventing knowledge they don't have, or they just make "mistakes" e.g. in arithmetic, or in attempting to copy large amounts of code verbatim.

And when you're interested in mistakes, you're generally interested in a specific category of mistakes, like arithmetic, or logic, or copying mistakes, and we refer to them as such -- arithmetic errors, logic errors, etc.

So I don't think hallucination is taking away from any kind of specificity. To the contrary, it is providing specificity, because we don't call arithmetic errors hallucinations. And we use the word hallucination precisely to distinguish it from these run-of-the-mill mistakes.

[1] https://trends.google.com/explore?q=hallucination&date=all&g...

markus_zhang•1mo ago

Maybe humans are just random word generators too? Just more sophisticated ones. But I think I read some pieces that we are not random word generators but something better. Forgot the word.

gchamonlive•1mo ago

I think you are reading too much into the title. "Neuron" is a totally valid way of refering to the perception unit in such models. It's got nothing to do with biology apart from the name.

lambdaone•1mo ago

"Hallucinate" is a term of art, and does not imply a philosophical commitment to whether LLMs have minds. "Confabulation" might be a more appropriate term.

What is indisputable is that LLMs, even though they are 'just' word generators, are remarkably good at generating factual statements and accurate answers to problems, yet also regrettably inclined to generating apparenly equally confident counterfactual statements and bogus answers. That's all that 'hallucination' means in this context.

If this work can be replicated, it may offer a way to greatly improve the signal-to-bullshit ratio of LLMs, and that will be both impressive and very useful if true.

2026iknewit•1mo ago

This is still not true.

"Whenever they get something "right," it's literally by accident." "the random word generator"

First of, the input is not random at all which allows the question how random the output is.

Second, it compresses data which has an impact on that data. Probably cleaning or adjustment which should reduce 'random' even more. It compresses data from us into concepts. A high level concept is more robust than 'random'.

Thinking or reasoning models are also finetuning the response by walking the hyperspace and basically collecting and strengthening data.

We as humans do very similiar things and no one is calling us just random word predictors...

And because of this, "hallucinations -- plausible but factually incorrect outputs" is an absolut accurate description of what an LLM does when it response with a low probability output.

Humans also do this often enough btw.

Please stop saying an LLM is just a random word predictor.

jaredcwhite•1mo ago

OK, it's a semi-random word predictor.

2026iknewit•1mo ago

I'm lost on your comment.

Its like you have an agenda against LLMs by now marking them as 'semi-random' and undermining the complexity and the results we get from current LLMs.

emp17344•1mo ago

LLMs work by “walking” “hyperspace” ? Don’t try to correct people if you don’t know what you’re talking about. LLMs are quite literally non-deterministic language models, and that’s all they are.

2026iknewit•1mo ago

Yes hyperspace: "Hyperspace is a concept from physics and science fiction referring to a higher-dimensional space"

I prefer to write hyperspace instead of n-dimensional.

Feel free to explain to me why you think my description is wrong.

LLMs are not non-deterministic in their nature. We add noise/randomess into specific layers to make them more 'creative'/'engaging' instead of always getting the exact same response we get variations.

Your whole sentence doesn't even contain a real description of what a LLM is. You say 'LLM are random LLMs'.

I explained that in a different comment already, feel free to check them out.

dpweb•1mo ago

We don't understand the brain. We fully understand what LLM are doing, humans built them. The idea we don't understand what LLMs are doing is magical. Magical is good for clicks and fundraising.

allears•1mo ago

We know how we built the machines, but their complexity produces emergent behavior that we don't completely understand.

emp17344•1mo ago

This isn’t settled science, by the way. There’s evidence that the “emergent” behaviors are just a mirage.

dudu24•1mo ago

I'm tired of this pseudointellectual reductionist response. It's not "literally by accident" when they're trained to do something, as if we are not also machines that generate next actions based on learned neural weights and abstract (embedded) representations. Your issue is with semantics rather than content.

Obviously "hallucinate" and "lie" are metaphors. Get over it. These are still emergent structures that we have a lot to learn from by studying. But I suppose any attempt by researchers to do so should be disregarded because Person On The Internet has watched the 3blue1brown series on Neural Nets and knows better. We know the basic laws of physics, but spend lifetimes studying their emergent behaviors. This is really no different.

tracker1•1mo ago

I just kind of wish the behavior for "hallucinations" just didn't have such confident language in the context... actual people will generally be relatively forthcoming at the edge of their knowledge or at least not show as much confidence. I know LLMs are a bit different, but that's about the best comparison I can come up with.

directorscut82•1mo ago

Of course they hallucinate because we are training on random mode. +Since you mentioned 3blue1brown there is an excellent video on ANN interpretation based on the works of famous researchers who attempt to provide plausible explanations about how these (transformers based) archs store and retrieve information. Randomness and stochasticity is literally the most basic components which allow all these billions of parameters to represent better embedding spaces almost hilbertian in nature and barely orthogonal as training progresses.

The "emergent structures" you are mentioning are just the outcome of randomness guided by "gradiently" descending to data landscapes. There is nothing to learn by studying these frankemonsters. All these experiments have been conducted in the past (decades past) multiple times but not at this scale.

We are still missing basic theorems, not stupid papers about which tech bro payed the highest electricity bill to "train" on extremely inefficient gaming hardware.

cyost•1mo ago

Somewhat on the subject: here's a neuroscientist reflecting on our failure to model a worm's brain, a "mere" 302 neurons (3 parts, this one is the first). https://ccli.substack.com/p/the-biggest-mystery-in-neuroscie...

Biological systems are hard.

jrm4•1mo ago

There is no such thing as a "hallucination" that could be isolated from "not a hallucination" in a provable systematic way because all they do is hallucinate.

I'm extremely comfortable calling this paper complete and utter bullshit (or, I suppose if I'm being charitable, extremely poorly titled) from the title alone.

eurekin•1mo ago

We are in the vibe science era, it seems

2026iknewit•1mo ago

The Input of an LLM is real data. The n-dimensional space an LLM works in is a reflection of this. Statistical probably speaking there should be a way of knowing when an LLM is confident vs. when not.

This type of research is absolut valid.

An LLM is not just hallucinate.

lambdaone•1mo ago

Arguably, all we do is something similar to hallucination; it's just that hundreds of millions of years have selected against brains that generate internal states that lead to counter-survival behavior.

I recently almost fell on a tram as it accelerated suddenly; my arm reached out for a stanchion that was out of my vision, so rapidly I wasn't aware of what I was doing before it had happened. All of this occurred using subconscious processes, based on a non-physical internal mental model of something I literally couldn't see at the moment it happened. Consciousness is over-rated; I believe Thomas Metzinger's work on consciousness (specifically, the illusion of consciousness) captures something really important about the nature of how our minds really work.

lambdaone•1mo ago

The title as posted is misleading and sensational: we should use the actual paper title, "H-Neurons: On the Existence, Impact, and Origin of Hallucination-Associated Neurons in LLMs", which is far less sensational.

dijksterhuis•1mo ago

> please use the original title, unless it is misleading or linkbait; don't editorialize.

https://news.ycombinator.com/newsguidelines.html

dang•1mo ago

Yes, changed above now. (Submitted title was "Origin of Hallucination in LLMs, The physical source of hallucinations has found")

Submitters: If you want to say what you think is important about an article, that's fine, but do it by adding a comment to the thread. Then your view will be on a level playing field with everyone else's: https://hn.algolia.com/?dateRange=all&page=0&prefix=false&so...

France's homegrown open source online office suite

British drivers over 70 to face eye tests every three years

Start all of your commands with a comma (2009)

Hoot: Scheme on WebAssembly

Reinforcement Learning from Human Feedback

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

The Waymo World Model

Coding agents have replaced every framework I used

First Proof

Vocal Guide – belt sing without killing yourself

Stories from 25 Years of Software Development

A Fresh Look at IBM 3270 Information Display System

Unseen Footage of Atari Battlezone Arcade Cabinet Production

Ga68, a GNU Algol 68 Compiler

Making geo joins faster with H3 indexes

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Monty: A minimal, secure Python interpreter written in Rust for use by AI

Hackers (1995) Animated Experience

Sheldon Brown's Bicycle Technical Info

What Is Ruliology?

Show HN: I spent 4 years building a UI design tool with only the features I use

Show HN: Kappal – CLI to Run Docker Compose YML on Kubernetes for Local Dev

Show HN: If you lose your memory, how to regain access to your computer?

An Update on Heroku

Microsoft open-sources LiteBox, a security-focused library OS

Cross-Region MSK Replication: K2K vs. MirrorMaker2

How to effectively write quality code with AI

Dark Alley Mathematics

Female Asian Elephant Calf Born at the Smithsonian National Zoo

I now assume that all ads on Apple news are scams

France's homegrown open source online office suite

British drivers over 70 to face eye tests every three years

Start all of your commands with a comma (2009)

Hoot: Scheme on WebAssembly

Reinforcement Learning from Human Feedback

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

The Waymo World Model

Coding agents have replaced every framework I used

First Proof

Vocal Guide – belt sing without killing yourself

Stories from 25 Years of Software Development

A Fresh Look at IBM 3270 Information Display System

Unseen Footage of Atari Battlezone Arcade Cabinet Production

Ga68, a GNU Algol 68 Compiler

Making geo joins faster with H3 indexes

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Monty: A minimal, secure Python interpreter written in Rust for use by AI

Hackers (1995) Animated Experience

Sheldon Brown's Bicycle Technical Info

What Is Ruliology?

Show HN: I spent 4 years building a UI design tool with only the features I use

Show HN: Kappal – CLI to Run Docker Compose YML on Kubernetes for Local Dev

Show HN: If you lose your memory, how to regain access to your computer?

An Update on Heroku

Microsoft open-sources LiteBox, a security-focused library OS

Cross-Region MSK Replication: K2K vs. MirrorMaker2

How to effectively write quality code with AI

Dark Alley Mathematics

Female Asian Elephant Calf Born at the Smithsonian National Zoo

I now assume that all ads on Apple news are scams

On the Existence, Impact, and Origin of Hallucination-Associated Neurons in LLMs

Comments