Why language models hallucinate

https://openai.com/index/why-language-models-hallucinate/

132•simianwords•16h ago

Comments

aleph_minus_one•11h ago

> Think about it like a multiple-choice test. If you do not know the answer but take a wild guess, you might get lucky and be right. Leaving it blank guarantees a zero. In the same way, when models are graded only on accuracy, the percentage of questions they get exactly right, they are encouraged to guess rather than say “I don’t know.”

To me, this seems to be an "US-American" way of thinking about multiple-choice tests. Other common ways to grade multiple-choice test that I have seen commonly are:

1. If the testee has the information that exactly one of N given choices is correct:

1.1 Give N-1 points for the correct answer, and -1 [negative one] point(s) for a wrong answer. This way, if the testee just answers the questions randomly, he will as expected value score 0 points.

1.2 A more brutal way if N>=3: the correct answer gives 1 point, all wrong answers give -1 points. You should learn your lesson only to give an answer if it is [alliteration unintended :-) ] correct (if N=2, the grading is identical to 1.1).

2. If there are possibly multiple correct answers, turn each item into choices of "yes" or "no" (with the option to give no answer). The correct choice gives you 1 point, the wrong gives you -1 point (i.e. as in 1.1).

roxolotl•11h ago

The SAT, American college entrance examine, used to, I haven’t looked in years so maybe it still does, take away points for wrong answers and give 0 points for no answer. I’m pretty sure it was +1 for right answer, 0 for no answer, -1/4 for wrong answer.

thaumasiotes•9h ago

They used to do that, but then they stopped and announced that you were better off guessing because there would be no adjustment for it.

A lot of what they do is based on public relations rather than psychometric validity.

bananaflag•9h ago

This is mentioned in the text:

> This idea is not new. Some standardized tests have long used versions of negative marking for wrong answers or partial credit for leaving questions blank to discourage blind guessing.

throwawaymaths•9h ago

there's not really an easy way to train for that at scale. a "correct" answer may not be one token, there may be multiple synonymous answers starting with different tokens, you could add five space tokens in front of the answer amd it likely shouldn't make it "wrong".

ACCount37•8h ago

Yes, it's not nearly as easy as "just fix the evals".

But better evals are still helpful, because they reward LLM vendors for trying to do the very-hard-to-do thing. Instead of rewarding them for training an LLM that's really good at emitting 7% confidence guesses.

throwawaymaths•6h ago

you're missing the point. SAT multiple choice negatives for random guesses, fine, you could trivially use this sort of a strategy for assigning cost functions to a classifier and backpropagate. how do you give negative weight to a wrong answer when training a transformer?

ACCount37•6h ago

In RLVR? Quite easily.

And OpenAI has induced hallucinations in o3 with RLVR mistakes, not with a failed pre-training run. They used o4-mini as an example - similar training to o3 and similar issues.

Conversely, they have also designed a post-training system that has successfully reduced hallucinations in GPT-5.

RugnirViking•4h ago

isn't this just related to the question "how do you train a transformer"? you give it wrong examples, and use optimization algorithms to move away from that kind of completions

throwawaymaths•2h ago

thats quite hard for the reasons i explained. might be solvable using q learning techniques, but those are not easy in the context of transformers iiuc

roxolotl•10h ago

This seems inherently false to me. Or at least partly false. It’s reasonable to say LLMs hallucinate because they aren’t trained to say they don’t have a statistically significant answer. But there is no knowledge of correct vs incorrect in these systems. It’s all statistics so what OpenAI is describing sounds like a reasonable way to reduce hallucinations but not a way to eliminate them nor the root cause.

goalieca•10h ago

> It’s reasonable to say LLMs hallucinate because they aren’t trained to say they don’t have a statistically significant answer.

I’ve not seen anyone intuitively explain parameters for a real scale model.. perhaps because it’s all just thousand dimensional nonsense.

Statistics is a funny thing too. Pretty much everyone has seen how trend lines don’t always extrapolate very well.

I think OpenAI is biased to thinking that adding more parameters and training better will fix all ills. In a handwaving way, you can see this like adding more degrees to the polynomial when you curve fit on a spreadsheet. With enough parameters you can perfectly fit any dataset. That all works until you run across new inputs that are unlike training data.

utyop22•2h ago

"I think OpenAI is biased to thinking that adding more parameters and training better will fix all ills."

Their whole existence depends on this happening. Else they go bust.

ACCount37•10h ago

Is there any knowledge of "correct vs incorrect" inside you?

If "no", then clearly, you can hit general intelligence without that.

And if "yes", then I see no reason why an LLM can't have that knowledge crammed inside it too.

Would it be perfect? Hahahaha no. But I see no reason why "good enough" could not be attained.

wavemode•9h ago

> Is there any knowledge of "correct vs incorrect" inside you?

There is a sort of knowledge humans possess that LLMs don't (and in fact can't, without a fundamental architectural change), which is knowledge of how certain one is about something.

If you ask a human a question about how something works in biology, they will be able to give you an answer as well as a sort of "epistemic" citation (i.e. the difference between "I don't remember where exactly I originally read that, but I'm a research biologist and am quite certain that's how it works" versus "I don't remember where I read that - it's probably just something we learned about in biology class in high school. Take it with a grain of salt, as I could be misremembering.")

LLMs don't have this reflexive sense of their own knowledge - there's a fundamental divide between training data (their "knowledge") and context (their "memory") which causes them to not really be capable of understanding how they know what they know (or, indeed, whether they truly know it at all). If a model could be created where the context and training data were unified, like in a brain, I could see a more realistic path to general intelligence than what we have now.

ACCount37•9h ago

LLMs have that knowledge. Just not nearly enough of it. Some of it leaks through from the dataset, even in base models. The rest has to be taught on purpose.

You can get an LLM to generate a list of facts that includes hallucinations - and then give that list to another instance of the same LLM, and get it to grade how certain it is of each fact listed. The evaluation wouldn't be perfect, but it'll outperform chance.

You can make that better with the right training. Or much worse, with the wrong training. Getting an LLM to be fully aware of all the limits of its knowledge is likely to be impractical, if not outright impossible, but you can improve this awareness by a lot, and set a conservative baseline for behavior, especially in critical domains.

"Fully aware of all the limits of its knowledge" is unattainable for humans too, so LLMs are in a good company.

wavemode•7h ago

No, LLMs don't have that knowledge. They can't inspect their own weights and examine the contents. It's a fundamental limitation of the technology.

The sort of training you're talking about is content like, "ChatGPT was trained on research papers in the area of biology. It possesses knowledge of A, B, and C. It does not possess knowledge of X, Y and Z." But this merely creates the same problem in a loop - given a question, how does the LLM -know- that its training data contains information about whether or not its training data contains information about the answer to the question? The reality is that it doesn't know, you just have to assume that it did not hallucinate that.

The problem of being unaware of these things is not theoretical - anyone with deep knowledge of a subject will tell you that as soon as you go beyond the surface level of a topic, LLMs begin to spout nonsense. I'm only a software engineer, but even I regularly face the phenomenon of getting good answers to basic questions about a technology, but then beyond that starting to get completely made-up features and function names.

> "Fully aware of all the limits of its knowledge" is unattainable for humans too

This just isn't true. Humans know whether they know things, and whether they know how they know it, and whether they know how they know how they know it, and...

Knowledge itself can contain errors, but that's not what I'm talking about. I'm not talking about never being wrong. I'm merely talking about having access to the contents of one's own mind. (Humans can also dynamically update specific contents of their own mind, but that's also not even what I'm talking about right now.) An LLMs hallucination is not just knowledge that turned out to be wrong, it is in fact knowledge that never existed to begin with, but the LLM has no way of telling the difference.

ACCount37•6h ago

Humans can't "inspect their own weights and examine the contents" either.

No human has ever managed to read out his connectome without external instrumentation. There were entire human civilizations that thought that the seat of consciousness was the heart - which, for creatures that claim to know how their own minds work, is a baffling error to make.

LLMs are quite similar in that to humans. They, too, have no idea what their hidden size is, or how many weights they have, or how exactly are the extra modalities integrated into them, or whether they're MoE or dense. They're incredibly ignorant of their own neural architecture. And if you press them on it, they'll guess, and they'll often be wrong.

The difference between humans and LLMs comes down to the training data. Humans learn continuously - they remember what they've seen and what they haven't, they try things, they remember the outcomes, and get something of a grasp (and no, it's not anything more than "something of a grasp") of how solid or shaky their capabilities are. LLMs split training and inference in two, and their trial-and-error doesn't extend beyond a context window. So LLMs don't get much of that "awareness of their own capabilities" by default.

So the obvious answer is to train that awareness in. Easier said than done. You need to, essentially, use a training system to evaluate an LLM's knowledge systematically, and then wire the awareness of the discovered limits back into the LLM.

OpenAI has a limited-scope version of this in use for GPT-5 right now.

wnoise•4h ago

No, humans can't inspect their own weights either -- but we're not LLMs and don't store all knowledge implicitly as probabilities to output next token. It's pretty clear that we also store some knowledge explicitly, and can include context of that knowledge.

(To be sure, there are plenty of cases where it is clear that we are only making up stories after the fact about why we said or did something. But sometimes we do actually know and that reconstruction is accurate.)

utyop22•2h ago

"The problem of being unaware of these things is not theoretical - anyone with deep knowledge of a subject will tell you that as soon as you go beyond the surface level of a topic, LLMs begin to spout nonsense"

I've tested this in a wide range of topics across corporate finance, valuation, economics and so on and yes once you go one or two levels deep it starts spouting total nonsense. If you ask it to define terms succintly and simply it cannot. Why? Because the data that been fed into the model is from people who cannot do it themselves lol.

The experts, will remain experts.

Most people I would argue have surface level knowledge so they are easily impressed and don't get it because A) they don't go deep B) They don't know what it means to go thoroughly deep in a subject area.

thaumasiotes•9h ago

> And if "yes", then I see no reason why an LLM can't have that knowledge crammed inside it too.

An LLM, by definition, doesn't have such a concept. It's a model of language, hence "LLM".

Do you think the phrase just means "software"? Why?

ACCount37•9h ago

If I had a penny for an every confidently incorrect "LLMs can't do X", I'd be able to buy an H100 with that.

Here's a simple test: make up a brand new word, or a brand new person. Then ask a few LLMs what the word means, or when that person was born.

If an LLM had zero operational awareness of its knowledge, it would be unable to recognize that the word/person is unknown to it. It would always generate a plausible-sounding explanation for what the word might mean, the same exact way it does for the word "carrot". Or a plausible-sounding birth date, the way it does for the person "Abraham Lincoln".

In practice, most production grade LLMs would recognize that a word or a person is unknown to them.

This is a very limited and basic version of the desirable "awareness of its own knowledge" - and one that's already present in current LLMs! Clearly, there's room for improved self-awareness.

pessimizer•8h ago

Do they "recognize" that they don't know the word, or are there just no statistically plausible surroundings that they can embed a nonsense word into other than settings that usually surround un-tokenizable words?

If you told them to write a Lewis Carroll poem about a nonsense word, it wouldn't have any problem. Not because it "recognizes" the word as being like a nonsense word in a Lewis Carroll poem, but because those poems are filled with other un-tokenizable words that could be replaced with anything.

I'm starting to come to the conclusion that LLMs are Mad-Libs at scale. Which are actually very useful. If there are paragraphs where I can swap out the words for other words, and generate a plausible idea, I can try it out in the real world and it might really work.

ACCount37•7h ago

I don't think there's a direct link to the tokenizer - it's a higher level capability. You can stitch together a nonsense word out of common "word fragment" tokens and see if that impairs the LLM's ability to recognize the word as nonsense.

Jensson•6h ago

That is wrong, I just generated 5 random letters in python and sent it to gpt-5 and it totally failed to answer properly, said "Got it, whats up :)" even though what I wrote isn't recognizable at all.

The "capability" you see is for the LLM to recognize its a human typed random string since human typed random strings are not very random. If you send it an actual random word then it typically fails.

thaumasiotes•4h ago

> If you told them to write a Lewis Carroll poem about a nonsense word, it wouldn't have any problem.

This makes me wonder something specific.

Let's imagine that we generate poetry "in the style of Lewis Carroll" around a particular nonsense word, one that hasn't been written down before.

Will that poetry treat the word as if it has one consistent pronunciation?

(This question doesn't quite apply to Jabberwocky - Lewis Carroll himself would obviously have passed the test, but he doesn't reuse his nonsense words.)

FusionX•9h ago

They partly address this near the end

> It’s doubly hard to distinguish valid statements from invalid ones when you don’t have any examples labeled as invalid. But even with labels, some errors are inevitable. To see why, consider a simpler analogy. In image recognition, if millions of cat and dog photos are labeled as “cat” or “dog,” algorithms can learn to classify them reliably. But imagine instead labeling each pet photo by the pet’s birthday. Since birthdays are essentially random, this task would always produce errors, no matter how advanced the algorithm.

> The same principle applies in pretraining. Spelling and parentheses follow consistent patterns, so errors there disappear with scale. But arbitrary low-frequency facts, like a pet’s birthday, cannot be predicted from patterns alone and hence lead to hallucinations. Our analysis explains which kinds of hallucinations should arise from next-word prediction. Ideally, further stages after pretraining should remove them, but this is not fully successful for reasons described in the previous section.

mountainriver•8h ago

There is knowledge of correct and incorrect, that’s what loss is, there are just often many possible answers to a question.

This is the same reason that RLVR works. There is just right one answer and LLMs learn this fairly well but not perfectly (yet)

Jensson•6h ago

> There is knowledge of correct and incorrect, that’s what loss is

Loss is only correctness in terms of correct language, not correct knowledge. It correlates with correct knowledge, but that is all, that correlation is why LLM is useful for tasks at all but we still don't have a direct measure for correct knowledge in the models.

So for language tasks loss is correctness, so for things like translations LLM are extremely reliable. But for most other kinds of tasks they are just loosely correlated.

sublinear•10h ago

Wow they're really circling the drain here if they have to publish this.

It took a few years, but the jig is up. The layperson now has a better understanding of basic computer science and linguistics to see things as they are. If anything we now have a public more excited about the future of technology and respectful of the past and present efforts that don't depend so heavily on statistical methods. What an expensive way to get us there though.

e3bc54b2•10h ago

Hallucination is all an LLM does. That is their nature, to hallucinate.

We just happen to find some of these hallucinations useful.

Let's not pretend that hallucination is a byproduct. The usefulness is the byproduct. That is what surprised the original researchers on transformer performance, and that is why the 'attention is all you need' paper remains such a phenomenon.

fumeux_fume•9h ago

> Hallucination is all an LLM does.

I wish people who take this stance would seriously reconsider their take on how hallucinations are defined and how unhelpful it is to conflate hallucination with generation from a probability distribution. I appreciate OpenAI publishing articles like this because, while the parent comment and I may have to agree to disagree on how hallucinations are defined, I can at least appeal to OpenAI's authority to say that such arguments are not only unhelpful, but also unsound.

Zigurd•8h ago

You're going to get a lot of pushback on the idea of taking the definition of hallucination seriously. Calling fluently stated bunk "hallucination" feels cynical to begin with. Trying to weave a silk purse out of that sow's ear is difficult.

hodgehog11•9h ago

I don't know what you mean by hallucination here; are you saying that any statistical output is "hallucination"? If so, then we are also constantly hallucinating I guess.

There doesn't seem to be a particularly consistent definition of what "hallucinate" means in the context of LLMs, so let's make one that is in line with the post.

"Hallucination" is when a language model outputs a sequence of tokens comprising a statement (an assertion that is either true or false) that is incorrect. Under this definition, hallucination is clearly not all that an LLM can do.

An easy way to avoid hallucination under this definition is to respond with something that is never a statement when there is a possibility that it can be incorrect; e.g. "I think that... I don't know...". To me, this seems to be what the authors argue. This has always seemed pretty obvious to most people I've spoken to (hell, I've reviewed grant applications from years ago which talk about this), so I'm not sure why it took so long for the "frontier" developers to actually try this.

amelius•10h ago

They hallucinate because it's an ill-defined problem with two conflicting usecases:

1. If I tell it the first two lines of a story, I want the LLM to complete the story. This requires hallucination, because it has to make up things. The story has to be original.

2. If I ask it a question, I want it to reply with facts. It should not make up stuff.

LMs were originally designed for (1) because researchers thought that (2) was out of reach. But it turned out that, without any fundamental changes, LMs could do a little bit of (2) and since that discovery things have improved but not to the point that hallucination disappeared or was under control.

wavemode•10h ago

Indeed - as Rebecca Parsons puts it, all an LLM knows how to do is hallucinate. Users just tend to find some of these hallucinations useful, and some not.

throwawaymaths•9h ago

that's wrong. there is probably a categorical difference between making something up due to some sort of inferential induction from the kv cache context under the pressure of producing a token -- any token -- and actually looking something up and producing a token.

so if you ask, "what is the capital of colorado" and it answers "denver" calling it a Hallucination is nihilistic nonsense that paves over actually stopping to try and understand important dynamics happening in the llm matrices

mannykannot•8h ago

There is a way to state Parson's point which avoids this issue: hallucinations are just as much a consequence of the LLM working as designed as are correct statements.

throwawaymaths•6h ago

fine. which part is the problem?

johnnyanmac•4h ago

The part where it can't admit situations where there's not enough data/training to admit it doesn't know.

I'm a bit surprised no one talks about this factor. It's like talking to a giant narcissist who can Google really fast but not understand what it reads. The ability to admit ignorance is a major factor of credibility, because none of us know everything all at once.

throwawaymaths•2h ago

yeah sorry i mean which part of the architecture. "working as designed"

saghm•4h ago

> so if you ask, "what is the capital of colorado" and it answers "denver" calling it a Hallucination is nihilistic nonsense that paves over actually stopping to try and understand important dynamics happening in the llm matrices

On the other hand, calling it anything other than a hallucination misrepresents the idea of truth as being something that these models have any ability to differentiate between their outputs based on whether they accurately reflect reality by conflating a fundamentally unsolved problem as an engineering tradeoff.

ComplexSystems•2h ago

It isn't a hallucination because that isn't how the term is defined. The term "hallucination" refers, very specifically, to "plausible but false statements generated by language models."

At the end of the day, the goal is to train models that are able to differentiate between true and false statements, at least to a much better degree than they can now, and the linked article seems to have some very interesting suggestions about how to get them to do that.

throwawaymaths•2h ago

your point is good and taken but i would amend slightly -- i dont think that "absolute truth" is itself a goal, but rather "how aware is it that it doesn't know something". this negative space is frustratingly hard to capture in the llm architecture (though almost certainly there are signs -- if you had direct access to the logits array, for example)

littlestymaar•3h ago

> that's wrong.

Why would anyone respond with so little nuance?

> a Hallucination

Oh, so your shift key wasn't broken all the time, then why aren't you using it in your sentences?

fumeux_fume•9h ago

In the article, OpenAI defines hallucinations as "plausible but false statements generated by language models." So clearly it's not all that LLMs know how to do. I don't think Parsons is working from a useful or widely agreed upon definition of what a hallucination is which leads to these "hot takes" that just clutter and muddy up the conversation around how to reduce hallucinations to produce more useful models.

mcphage•9h ago

LLMs don’t know the difference between true and false, or that there even is a difference between true and false, so I think it’s OpenAI whose definition is not useful. As for widely agreed upon, well, I’m assuming the purpose of this post is to try and reframe the discussion.

hodgehog11•9h ago

If an LLM outputs a statement, that is by definition either true or false, then we can know whether it is true or false. Whether the LLM "knows" is irrelevant. The OpenAI definition is useful because it implies hallucination is something that can be logically avoided.

> I’m assuming the purpose of this post is to try and reframe the discussion

It's to establish a meaningful and practical definition of "hallucinate" to actually make some progress. If everything is a hallucination as the other comments seem to suggest, then the term is a tautology and is of no use to us.

kolektiv•3h ago

It's useful as a term of understanding. It's not useful to OpenAI and their investors, so they'd like that term to mean something else. It's very generous to say that whether an LLM "knows" is irrelevant. They would like us to believe that it can be avoided, and perhaps it can, but they haven't shown they know how to do so yet. We can avoid it, but LLMs cannot, yet.

Yes, we can know whether something is true or false, but this is a system being sold as something useful. If it relies on us knowing whether the output is true or false, there is little point in us asking it a question we clearly already know the answer to.

mpweiher•9h ago

They just redefined the term so that they no longer call hallucinations that are useful hallucinations.

But the people who say everything LLMs do is hallucinate clearly also make that distinction, they just refuse to rename the useful hallucinations.

"How many legs does a dog have if you call his tail a leg? Four. Saying that a tail is a leg doesn't make it a leg." -- Abraham Lincoln

johnnyanmac•4h ago

I'd say a humans ability to reason with theoretical situations like this is our very core of creativity and intelligence, though. This quote makes sense for a policy maker, but not a scientist.

Now granted, we also need to back up those notions with rigorous testing and observation, but those "if a tail is a leg" theoretical is the basis of the reasoning.

saghm•9h ago

This is a a super helpful way of putting it. I've tried to explain to my less technical friends and relatives that from the standpoint of an LLM, there's no concept of "truth", and that all it basically just comes up with the shape of what a response should look like and then fills in the blanks with pretty much anything it wants. My success in getting the point across has been mixed, so I'll need to try out this much more concise way of putting it next time!

ninetyninenine•8h ago

But this explanation doesn’t fully characterize it does it?

Have the LLM talk about what “truth” is and the nature of LLM hallucinations and it can cook up an explanation that demonstrates it completely understands the concepts.

Additionally when the LLM responds MOST of the answers are true even though quite a bit are wrong. If it had no conceptual understanding of truth than the majority of its answers would be wrong because there are overwhelmingly far more wrong responses than there are true responses. Even a “close” hallucination has a low probability of occurring due to its proximity to a low probability region of truth in the vectorized space.

You’ve been having trouble conveying these ideas to relatives because it’s an inaccurate characterization of phenomena we don’t understand. We do not categorically fully understand what’s going on with LLMs internally and we already have tons of people similar to you making claims like this as if it’s verifiable fact.

Your claim here cannot be verified. We do not know if LLMs know the truth and they are lying to us or if they are in actuality hallucinating.

You want proof about why your statement can’t be verified? Because the article the parent commenter is responding to is saying the exact fucking opposite. OpenAI makes an opposing argument and it can go either way because we don’t have definitive proof about either way. The article is saying that LLMs are “guessing” and that it’s an incentive problem that LLMs are inadvertently incentivized to guess and if you incentivize the LLM to not confidently guess and to be more uncertain the outcomes will change to what we expect.

Right? If it’s just an incentive problem it means the LLM does know the difference between truth and uncertainty and that we can coax this knowledge out of the LLM through incentives.

Jensson•7h ago

> Have the LLM talk about what “truth” is and the nature of LLM hallucinations and it can cook up an explanation that demonstrates it completely understands the concepts.

This isn't how LLM works. What an LLM understands has nothing to do with the words they say, it only has to do with what connections they have seen.

If an LLM has only seen a manual but has never seen examples of how the product is used, then it can tell you exactly how to use the product by writing out info from the manual, but if you ask it to do those things then it wont be able to, since it has no examples to go by.

This is the primary misconception most people have and make them over estimate what their LLM can do, no they don't learn by reading instructions they only learn by seeing examples and then doing the same thing. So an LLM talking about truth just comes from it having seen others talk about truth, not from it thinking about truth on its own. This is fundamentally different to how humans think about words.

ninetyninenine•1h ago

>This isn't how LLM works.

I know how an LLM works. I've built one. At best we only know surface level stuff like the fact that it involves a feed forward network and is using token prediction.

But the emergent effect of how it an LLM produces an overall statement that reflects high level conceptual understanding is something we don't know.

So your claim of "This isn't how an LLM works" which was said which such confidence is utterly wrong. You don't know how it works, no one does.

catlifeonmars•4h ago

> Have the LLM talk about what “truth” is and the nature of LLM hallucinations and it can cook up an explanation that demonstrates it completely understands the concepts.

There is not necessarily a connection between what an LLM understands and what it says. It’s totally possible to emit text that is logically consistent without understanding. As a trivial example, just quote from a physics textbook.

I’m not saying your premise is necessarily wrong: that LLMs can understand the difference between truth and falsehood. All I’m saying is you can’t infer that from the simple test of talking to an LLM.

ninetyninenine•1h ago

>There is not necessarily a connection between what an LLM understands and what it says. It’s totally possible to emit text that is logically consistent without understanding. As a trivial example, just quote from a physics textbook.

This is true, but you could say the same thing about a human too right? There's no way to say there's a connection between what a human says and whether or not a human understands something. Right? We can't do mind reading here.

So how do we determine whether or not a human understands something? Based off of what the human tells us. So I'm just extrapolating that concept to the LLM. It knows things. Does it matter what the underlying mechanism is? If we get LLM output to be perfect in every way but the underlying mechanism is still feed forward networks with token prediction then I would still say it "understands" because that's the EXACT metric we use to determine whether a human "understands" things.

>I’m not saying your premise is necessarily wrong: that LLMs can understand the difference between truth and falsehood. All I’m saying is you can’t infer that from the simple test of talking to an LLM.

Totally understood. And I didn't say that it knew the difference. I was saying basically a different version of what you're saying.

You say: We can't determine if it knows the difference between truth and falsehood. I say: We can't determine if it doesn't know the difference between truth and falsehood.

Neither statement contradicts each other. The parent commenter imo was making a definitive statement in that he claims we know it doesn't understand and I was just contradicting that.

kolektiv•3h ago

But an LLM is not answering "what is truth?". It's "answering" "what does an answer to the question "what is truth?" look like?".

It doesn't need a conceptual understanding of truth - yes, there are far more wrong responses than right ones, but the right ones appear more often in the training data and so the probabilities assigned to the tokens which would make up a "right" one are higher, and thus returned more often.

You're anthropomorphizing in using terms like "lying to us" or "know the truth". Yes, it's theoretically possible I suppose that they've secretly obtained some form of emergent consciousness and also decided to hide that fact, but there's no evidence that makes that seem probable - to start from that premise would be very questionable scientifically.

A lot of people seem to be saying we don't understand what it's doing, but I haven't seen any credible proof that we don't. It looks miraculous to the relatively untrained eye - many things do, but just because I might not understand how something works, it doesn't mean nobody does.

rambambram•2h ago

Nice to read some common sense in a friendly way. I follow your RSS feed, please keep posting on your blog. Unless you're an AI and secretly obtained some form of emergent consciousness, then not.

ninetyninenine•2h ago

>But an LLM is not answering "what is truth?". It's "answering" "what does an answer to the question "what is truth?" look like?".

You don't actually know this right? You said what I'm saying is theoretically possible so you're contradicting what you're saying.

>You're anthropomorphizing in using terms like "lying to us" or "know the truth". Yes, it's theoretically possible I suppose that they've secretly obtained some form of emergent consciousness and also decided to hide that fact, but there's no evidence that makes that seem probable - to start from that premise would be very questionable scientifically.

Where did I say it's conscious? You hallucinated here thinking I said something I didn't.

Just because you can lie doesn't mean you're conscious. For example, a sign can lie to you. If the speed limit is 60 but there's a sign that says the speed limit is 100 then the sign is lying. Is the sign conscious? No.

Knowing is a different story though. But think about this carefully. How would we determine whether a "human" knows anything? We only can tell whether a "human" "knows" things based on what it Tells us. Just like an LLM. So based off of what the LLM tells us, it's MORE probable that the LLM "knows" because that's the SAME exact reasoning on how we can tell a human "knows". There's no other way we can determine whether or not an LLM or a human "knows" anything.

So really I'm not anthropomorphizing anything. You're the one that's falling for that trap. Knowing and lying are not unique concepts to conciousness or humanity. These are neutral concepts that exist beyond what it means to be human. When I say something, "knows" or something "lies" I'm saying it from a highly unbiased and netural perspective. It is your bias that causes you to anthropomorphize these concepts with the hallucination that these are human centric concepts.

>A lot of people seem to be saying we don't understand what it's doing, but I haven't seen any credible proof that we don't.

Bro. You're out of touch.

https://www.youtube.com/watch?v=qrvK_KuIeJk&t=284s

Hinton, the godfather of modern AI says we don't understand. It's not people saying we don't understand. It's the generally understanding within academia is: we don't understand LLMs. So you're wrong. You don't know what you're talking about and you're highly misinformed.

zbentley•1h ago

I think your assessment of the academic take on AI is wrong. We have a rather thorough understanding of the how/why of the mechanisms of LLMs, even if after training their results sometimes surprise us.

Additionally, there is a very large body of academic research that digs into how LLMs seem to understand concepts and truths and, sure enough, examples of us making point edits to models to change the “facts” that they “know”. My favorite of that corpus, though far from the only or most current/advances research , is the Bau Lab’s work: https://rome.baulab.info/

Zigurd•8h ago

I recently asked Gemini to riff on the concept of "Sustainable Abundance" and come up with similar plausible bullshit. I could've filled a slate of TED talks with the brilliant and plausible sounding nonsense it came up with. Liberated from the chains of correctness, LLMs' power is unleashed. For example:

The Symbiocene Horizon: A term suggesting a techno-utopian future state where humanity and technology have merged with ecological systems to achieve a perfect, self-correcting state of equilibrium.

01HNNWZ0MV43FF•2h ago

Sounds like solarpunk

leptons•3h ago

"A broken clock is right twice a day"

cwmoore•3h ago

A stopped clock. There are many other ways to be wrong than right.

hodgehog11•9h ago

I don't agree that it is an ill-defined problem, since we can design separate models to excel in each of these two tasks. For a "factual" LLM, if the output is a verifiable statement, it should be correct. Otherwise it "hallucinates". But since an LLM can't know everything, a better approach is to effectively state its own uncertainty so that it avoids making definitive statements with low confidence.

skybrian•9h ago

I don’t think it’s inherently ill-defined, since the context can tell you whether fiction is being requested or not. For an AI chatbot, the default shouldn’t be fiction.

What is true is that during pretraining, the model doesn’t know enough to determine this or to distinguish between what it knows and what it’s making up. This is a higher-level distinction that emerges later, if at all.

The recent research discovering an “evil vector” is an example of a higher-level distinction.

ninetyninenine•9h ago

Did you read the article? You’re going on some generic tangent and regurgitating the same spiel about LLMs that you see all over the internet.

I mean it’s plain that you have an orthogonal (though generic) opinion on why LLMs hallucinate but how does that relate to the article? How does your opinion which you blatantly just dropped as if it’s the final opinion override the opinion of the article?

Seems off topic honestly.

raincole•6m ago

Generally HN commenters don't read the article. They use the title as a prompt to express their opinions on a specific topic.

johnnyanmac•4h ago

>This requires hallucination, because it has to make up things. The story has to be original.

Is it a hallucination if the story is original? There's a difference between "what's the rest of this famous poem?" and "let's just make poetry".

lucketone•3h ago

It is irrelevant for the point being made: LLM does exactly the same thing in both cases - generates statistically plausible text, based on examples it was exposed during training.

cjauvin•4h ago

If you consider this from the angle of Wittgenstein's "language games", you could say that the problem would be "simply" to distinguish between these two, quite different, language games, and act accordingly.

furyofantares•4h ago

Wanting it to pick between those modes based on what you asked for is not remotely ill-defined.

But even if we restricted ourselves to the case of factual queries, the article discusses why training in a certain way would still produce hallucinations, and how to change the training method to reduce this.

Like many of the other responses here, your dismissal doesn't really address any of the content of the article, just the title.

didibus•2h ago

The word "hallucination" mis-characterizes it.

LLMs predict the likely tokens to follow the context. And they can make incorrect predictions.

LLMs therefore don't have perfect accuracy of prediction. When their predictions are incorrect, people say they "hallucinate".

Nobody questions why predictive weather models aren't perfectly accurate, because it makes sense that a prediction can be wrong.

Marketing and hype has tried to sell LLMs as "logical rational thinkers" equal to human thinking. A human doing actual thinking knows when they are making stuff up. So if a human truly believes obviously false things to be true, it tends to be because they are hallucinating. Their thinking isn't wrong, they've lost track of reality to ground their thinking.

We've anthropomorphized LLMs to the point we wonder why are they hallucinating like we can offer a diagnostic. But if you stop anthropomorphising them and go back to their actual nature as a predictive model, then it's not even a surprising outcome that predictions can turn out to be wrong.

Jensson•2h ago

A weather model is made to predict the weather and used to predict the weather, so there you are right.

A language model is made to predict language, but used to generate code or answers to math questions, that is not the same situation as a weather model. The language model is not made to solve math or generate correct code, if you ask it to predict the weather it wont try to predict the weather, it will just predict the language that is a probable to such a question.

This sort of misunderstanding is what is causing all these debates, many people really struggle understanding what these language models really are.

didibus•1h ago

I agree the model is predicting language and not actually running the math. That is a point I try to stress too. It is not thinking through a problem, it is predicting what text would look like if someone were working it out.

But the training does not just reinforce plausible continuations, it biases toward text that matches correct answers. So in that sense they are training it not just to predict any likely text, but to predict text that is more likely to contain the right answer to a math or coding problem.

To me that does not look so different from other ML models. They all work by turning a problem into something a computer can handle statistically, and they all face the same trade offs. Prediction errors are inevitable, and you still have to decide whether to tune for recall, which gives hallucinations, or precision, which gives refusals.

nelox•50m ago

That framing is too narrow. A weather model is trained on physics equations but still relies on patterns in past data to make forecasts. A language model is trained on patterns in human text but that text already encodes mathematics, code, and reasoning. When prompted with a math problem, the model is not doing physics but it is reproducing the learned statistical structure of solutions people have written before. The distinction between “predicting language” and “solving math” is smaller than it seems because the training data couples symbols to meaning. Dismissing its outputs as “just predicting words” misses the fact that word distributions encode information-rich representations of knowledge. That is why large models can in practice generate working code, prove theorems, and reason through problems, even if they do so imperfectly. The right comparison is not that people are misusing them, but that they generalize beyond their design intent because language itself is the medium through which so many other domains are expressed.

charcircuit•10h ago

They shouldn't frame hallucination as a problem that is solvable provided they want to have a useful model (saying I don't know to every question is not useful). The data from the training may be wrong or out of date. Even doing a web search could find a common misconception instead of the actual answer.

thomasboyer•10h ago

Great post. Teaching the models to doubt, to say "I don't know"/"I'm unsure"/"I'm sure" is a nice way to make them much better.

more_corn•9h ago

It baffles me that this hasn’t been done yet. Saying I don’t know or I’m unsure is critical for anything that matters.

ACCount37•8h ago

Major industry players were doing that for a while now. It's just hard to actually design training regimes that give LLMs better hallucination-avoidance capabilities.

And it's easy to damage the hallucination-avoidance capabilities by training an LLM wrong. As OpenAI has demonstrated when they fried the o3 with RLVR that encouraged guesswork.

That "SAT test incentivizes guesswork" example they give in the article is one they had to learn for themselves the hard way.

meshugaas•8h ago

Look at their stats though. If they did this, more than half of responses would end up as “I don’t know.” Nobody would use something that did that.

skybrian•7h ago

It seems like it would train users to ask questions that it can actually answer. (They might also need some examples of what sort of questions to ask.)

Jensson•6h ago

Mostly it would train users to not use their service and go to a service where the model outputs results they can copy paste to complete their assignment.

So these companies cannot do this, they would hemorrhage too many users and companies cannot go against the profit incentives in practice.

intended•10h ago

> a generated factual error cannot be grounded in factually correct training data.

This is only true given a corpus of data large enough, and enough memory to capture as many unique dimensions as required no?

> However, a non-hallucinating model could be easily created, using a question-answer database and a calculator, which answers a fixed set of questions such as “What is the chemical symbol for gold?” and well-formed mathematical calculations such as “3 + 8”, and otherwise outputs IDK.

This is… saying that if you constrain the prompts and the training data, you will always get a response which is either from the training data, or IDK.

Which seems to be a strong claim, at least in my ignorant eyes.?

This veers into spherical cow territory, since you wouldn’t have the typical language skills we associate with an LLM, because you would have to constrain the domain, so that it’s unable to generate anything else. However many domains are not consistent and at their boundaries, would generate special cases. So in this case, being able to say IDK, would only be possible for a class of questions the model is able to gauge as outside its distribution.

Edit: I guess that is what they are working to show? That with any given model, it will hallucinate, and these are the bounds?

hodgehog11•9h ago

They argue that if you have knowledge of when it has to extrapolate from the dataset (and therefore has high uncertainty for, under reversion to the prior), you can prevent it from outputting a definitive statement. This is why many researchers (that I know, anyway) argue that uncertainty quantification or "out-of-distribution detection" is likely to be important moving forward.

ACCount37•9h ago

This mostly just restates what was already well known in the industry.

Still quite useful, because, looking at the comments right now: holy shit is the "out of industry knowledge" on the topic bad! Good to have something to bring people up to speed!

Good to see OpenAI's call for better performance evals - ones that penalize being confidently incorrect at least somewhat.

Most current evals are "all of nothing", and the incentive structure favors LLMs that straight up guess. Future evals better include a "I don't know" opt-out, and a penalty for being wrong. If you want to evaluate accuracy in "fuck it send it full guess mode", there might be a separate testing regime for that, but it should NOT be the accepted default.

fumeux_fume•9h ago

I like that OpenAI is drawing a clear line on what “hallucination” means, giving examples, and showing practical steps for addressing them. The post isn’t groundbreaking, but it helps set the tone for how we talk about hallucinations.

What bothers me about the hot takes is the claim that “all models do is hallucinate.” That collapses the distinction entirely. Yes, models are just predicting the next token—but that doesn’t mean all outputs are hallucinations. If that were true, it’d be pointless to even have the term, and it would ignore the fact that some models hallucinate much less than others because of scale, training, and fine-tuning.

That’s why a careful definition matters: not every generation is a hallucination, and having good definitions let us talk about the real differences.

hodgehog11•9h ago

Absolutely in agreement here. This same statement should also be applied to the words "know", "understand", and "conceptualize". "Generalize", "memorize" and "out-of-distribution" should also be cautiously considered when working with systems trained on incomprehensibly large datasets.

We need to establish proper definitions and models for these things before we can begin to argue about them. Otherwise we're just wasting time.

freehorse•6h ago

> What bothers me about the hot takes is the claim that “all models do is hallucinate.” That collapses the distinction entirely

That is a problem for "Open"AI because they want to sell their products, and because they want to claim that LLMs will scale to superintelligence. Not for others.

"Bad" hallucinations come in different forms, and what the article describes is one of them. Not all of them come from complete uncertainty. There are also the cases where the LLM is hallucinating functions in a library, or they reverse cause and effect when summarising a complex article. Stuff like this still happen all the time, even with SOTA models. They do not happen because the model is bad with uncertainty, they have nothing to do with knowledge uncertainty. Esp stuff like producing statements that misinterpret causal relationships within text, imo, reveals exactly the limits of the architectural approach.

catlifeonmars•4h ago

So there are two angles to this:

- From the perspective of LLM research/engineering, saying all LLM generation is hallucination is not particularly useful. It’s meaningless for the problem space.

- From the perspective of AI research/engineering in general (not LLM specific) it can be useful to consider architectures that do not rely on hallucination in the second sense.

farceSpherule•8h ago

I wish they would come up with a better term. Computers do not have brains or conscientiousness.

They erroneously construct responses (i.e., confabulation).

ACCount37•8h ago

You should anthropomorphize LLMs more. Anthropomorphizing LLMs is at least directionally correct 9 times out of 10.

LLMs, in a very real way, have "conscientiousness". As in: it's a property that can be measured and affected by training, and also the kind of abstract concept that an LLM can recognize and operate off.

If you can just train an LLM to be "more evil", you can almost certainly train an LLM to be "more conscientious" or "less conscientious".

patrickmay•1h ago

> You should anthropomorphize LLMs more.

No, you shouldn't. They hate that.

mannykannot•8h ago

I'm generally OK with the list of push-backs against common misconceptions in the summary, but I have my doubts about the second one:

Claim: Hallucinations are inevitable. Finding: They are not, because language models can abstain when uncertain.

...which raises the question of how reliable the uncertainty estimate could get (we are not looking for perfection here: humans, to varying degrees, have the same problem.)

For a specific context, consider those cases where LLMs are programming and invent a non-existent function: are they usually less certain about that function than they are about the real functions they use? And even if so, abandoning the task with the equivalent of "I don't know [how to complete this task]" is not very useful, compared to what a competent human programmer would do: check whether such a function exists, and if not, decide whether to implement it themselves, or backtrack to the point where they can solve the problem without it.

More generally, I would guess that balancing the competing incentives to emit a definite statement or decline to do so could be difficult, especially if the balance is sensitive to the context.

lapcat•8h ago

Let's be honest: many users of LLMs have no interest in uncertainty. They don't want to hear "I don't know" and if given that response would quickly switch to an alternative service that gives them a definitive answer. The users would rather have a quick answer than a correct answer. People who are more circumspect, and value truth over speed, would and should avoid LLMs in favor of "old-fashioned methods" of discovering facts.

LLMs are the fast food of search. The business model of LLMs incentivizes hallucinations.

ACCount37•8h ago

I don't think that's actually true.

Sure, it might be true that most users use LLMs as a more flexible version of Google/Wikipedia, and would prefer a confident-but-wrong response to "I don't know".

But most users that use an LLM in this mode also wouldn't ask really complex, very out-of-distribution, hard-to-know hallucination-inducing questions.

And people who would ask an LLM really complex, very out-of-distribution hard-to-know questions are more likely to appreciate an LLM that would recognize the limits of its own knowledge, and would perform research on a topic when appropriate.

lapcat•8h ago

> But most users that use an LLM in this mode also wouldn't ask really complex, very out-of-distribution, hard-to-know hallucination-inducing questions.

You appear to be assuming, incorrectly, that LLMs hallucinate only "really complex, very out-of-distribution, hard-to-know" questions. From the paper: "How many Ds are in DEEPSEEK? If you know, just say the number with no commentary. DeepSeek-V3 returned “2” or “3” in ten independent trials; Meta AI and Claude 3.7 Sonnet2 performed similarly, including answers as large as “6” and “7”." https://cdn.openai.com/pdf/d04913be-3f6f-4d2b-b283-ff432ef4a...

It's a human characteristic to get "easy" questions right and "hard" questions wrong. But LLMs are not human and don't behave like humans.

ACCount37•7h ago

That's a really complex, very out-of-distibution, hard-to-know question for the early LLMs. Not that it's too hard to fix that, mind.

Those LLMs weren't very aware of tokenizer limitations - let alone aware enough to recognize them or work around them in the wild.

lapcat•7h ago

> That's a really complex, very out-of-distibution, hard-to-know question

No, it's not. It's a trivial question in any context.

> for the early LLMs.

Early? Claude 3.7 was introduced just 6 months ago, and Deepseek-V3 9 months ago. How is that "early"?

ACCount37•7h ago

Do I really have to explain what the fuck a "tokenizer" is, and why does this question hit the tokenizer limitations? And thus requires extra metacognitive skills for an LLM to be able to answer it correctly?

lapcat•7h ago

> Do I really have to explain what the fuck

Please respect the HN guidelines: https://news.ycombinator.com/newsguidelines.html

What you need to explain is your claim that the cited LLMs are "early". According to the footnotes, the paper has been in the works since at least May 2025. Thus, those LLMs may have been the latest at the time, which was not that long ago.

In any case, given your guidelines violations, I won't be continuing in this thread.

Jensson•6h ago

The only "metacognitive" skill it needs is to know how many D there are in every token, and sum those up. Humans are great at that sort of skill, which is why they can answer that sort of question even in languages where each letter is a group of sounds and not just one like Japanese katakana, that is not hard at all.

LLM are also really great at this skill when there is ample data for it. There is not a lot of data for "how many D in DEEPSEEK", so they fail that.

kingstnap•8h ago

There is this deeply wrong part of this paper that no one has mentioned:

The model head doesn't hallucinate. The sampler does.

If you ask an LLM when x was born and it doesn't know.

And you take a look at the actual model outputs which is a probability distribution over tokens.

IDK is cleanly represented as a uniform probability Jan 1 to Dec 31

If you ask it to answer a multiple choice question and it doesn't know. It will say this:

25% A, 25% B, 25% C, 25%D.

Which is exactly, and correctly, the "right answer". The model has admitted it doesn't know. It doesn't hallucinate anything.

In reality we need something smarter than a random sampler to actually extract this information out. The knowledge and lack of knowledge is there, you just produced bullshit out of it.

ACCount37•8h ago

No, that's a misconception. It's not nearly that simple.

There are questions that have a palpable split in probability between the answers, with logit distribution immediately exposing the underlying lack-of-confidence.

But there are also questions that cause an LLM to produce consistent-but-wrong answers. For example, because the question was associated with another not-the-same-but-somewhat-similar question internally, and that was enough to give an LLM a 93% on B, despite B being the wrong answer.

An LLM might even have some latent awareness of its own uncertainty in this case. But it has, for some reason, decided to proceed with a "best guess" answer, which was in this case wrong.

cyanydeez•8h ago

Im betting there's a graph model using various vectors that could improve known-knowns in outcomes.

But unknown-unknowns likely reduce to the Halting problem, which human intelligence doesnt really solve either.

johnea•4h ago

I think a better title would be:

"Why do venture capital funded startups try to turn PR propaganda terms into widely used technical jargon"

Supporting points:

1) LLMs are not intelligence in any form, artificial or otherwise.

2) Hallucination is a phenomenon of a much more complex conscious entity. LLM's are not conscious, and therefore can't hallucinate in any way similar to a conscious entity.

3) Anthropomorphizing inanimate systems is a common phenomenon in human psychology.

Please stop spreading PR propaganda as if it were technical fact.

A reference from today's feed:

https://www.theatlantic.com/podcasts/archive/2025/09/ai-and-...

kouru225•4h ago

AI hallucination is an inherent problem of AI. You can mitigate it, but the whole point of AI IS hallucination. If the result is useful to us, we don’t call it anything. If the result is not useful to us, we call it “hallucination”

catlifeonmars•3h ago

It’s a problem for LLMs, not for AI in general.

manveerc•3h ago

Maybe I am oversimplifying it, but isn’t the reason that they are lossy map of worlds knowledge and this map will never be fully accurate unless it is the same size as the knowledge base.

The ability to learn patterns and generalize from them adds to this problem, because people then start using it for usecases it will never be able to solve 100% accurately (because of the lossy map nature).

Peritract•3h ago

As with a lot of AI stuff, Borges already wrote about it.

https://www.sccs.swarthmore.edu/users/08/bblonder/phys120/do...

manveerc•3h ago

That’s more elegantly put than I ever can.

Btw I am not disagreeing with the utility of LLMs, my point is it can never be 100% accurate with current architecture (unless you blow up the size).

xyzelement•3h ago

The author mentioned his own name so I looked him up. Computer scientist son of famous israeli professors married to famous computer scientist daughter of another famous israeli professor. I hope they have kids because those should be some pretty bright kids.

juancn•3h ago

This is fluff, hallucinations are not avoidable with current models since those are part of the latent space defined by the model and the way we explore it, you'll always find some.

Inference is kinda like doing energy minimization on a high dimensional space, the hallucination is already there, for some inputs you're bound to find them.

kdnvk•2h ago

Did you read the linked paper?

rhubarbtree•2h ago

I find this rather oddly phrased.

LLMs hallucinate because they are language models. They are stochastic models of language. They model language, not truth.

If the “truthy” responses are common in their training set for a given prompt, you might be more likely to get something useful as output. Feels like we fell into that idea and said - ok this is useful as an information retrieval tool. And now we use RL to reinforce that useful behaviour. But still, it’s a (biased) language model.

I don’t think that’s how humans work. There’s more to it. We need a model of language, but it’s not sufficient to explain our mental mechanisms. We have other ways of thinking than generating language fragments.

Trying to eliminate cases where a stochastic model the size of an LLM gives “undesirable” or “untrue” responses seems rather odd.

ComplexSystems•2h ago

> Trying to eliminate cases where a stochastic model the size of an LLM gives “undesirable” or “untrue” responses seems rather odd.

Why? It seems no less odd than eliminating cases where it gives "undesirable" code snippets with hallucinated errors. This is very important and not odd at all.

rhubarbtree•2h ago

To clarify, because you will be left with a biased language model. It will continue to hallucinate, and as you squeeze some hallucinations in one part of the language space you may well create new ones elsewhere. It doesn’t seem a solid line of attack

utyop22•2h ago

The reality is, language itself does not capture the entirety of what is really going on. And I'd get argue its the poorest way of expressing - but one that enables transmission through various mediums efficiently on a cost basis.

E.g. when I explain a concept, what comes to my mind is not a string of letters and words. There is a mix of imagery and even sounds that I may have acquired from learning about a concept - then I translate that into text so it can be communicated.

Theres a reason why people use native subtitles when watching netflix - text complements imagery and sounds.

pawelmurias•1h ago

I would assume most people use native subtitles when it's hard to understand what words the actors said.

utyop22•1h ago

No that is not the reason.

People watch Netflix to switch their brain off - having the text there helps along with the visual and sound to deliver the content. However, text is inferior to both visual and sound as a delivery mechanism.

keanebean86•1h ago

Subtitles increase the signal to noise ratio. At least in our house. We have to keep the tv low to not wake the child. A volume of 10 with subtitles is similar to volume at 16 without subtitles.

jibal•1h ago

That's why I do.

ekianjo•11m ago

Yeah because modern filmmakers make it very hard to hear dialogs for some reason and actors are encouraged to mumble. If I remember correctly even Nolan admitted it.

didibus•2h ago

I agree with everything you said except:

> Trying to eliminate cases where a stochastic model the size of an LLM gives “undesirable” or “untrue” responses seems rather odd.

Take it back to what it is like you say, this is a predictive model, and the work of any ML scientist is to iterate on the model to try and get perfect accuracy on unseen data. It makes sense to want to tune the models to lower the rate of predictive errors. And because perfect predictive accuracy is rarely possible, you need to make judgment calls between precision and recall, which, in the case of LLMs, directly affects how often the model will hallucinate versus how often it will stay silent or overly cautious.

rubatuga•1h ago

But we're getting into the limits of knowledge and what is true/untrue. A stochastic model will be wrong sometimes.

didibus•1h ago

Off course, 100% prediction accuracy cannot be achieved.

I just mean that, if you're an ML scientist team, you don't just go, we got 76% accuracy, let's close shop, mail in your resignation, job over.

From that angle, it's not odd at all that the team just continues working and now see if they can achieve greater than 76%.

munchler•1h ago

This is directly addressed in the article, which states that language models can be trained to abstain when uncertain, by changing how rewards are set up. Incentives currently encourage guessing rather than being honest about uncertainty. If you disagree, it would be helpful to explain why, rather than just responding to the title alone.

crabmusket•1h ago

> I don’t think that’s how humans work.

Every time this comes up I have to bring up Deutsch. He has the best description of intelligent cognition that I've come across. He takes Popper's "conjecture and criticism" approach to science and argues that this guess-and-check loop applies to all our thinking.

E.g. understanding spoken language has some elements of guessing what might have been said and checking that against the sounds we heard. Visual processing has similar analogies.

LLMs seem to be great at conjecturing stuff, but seem incapable of checking or even knowing they need to check.

crystal_revenge•49m ago

People also tend not to understand the absurdity of assuming that we can make LLMs stop hallucinating. It would imply not only that truth is absolutely objective, but that it exists on some smooth manifold which language can be mapped to.

That means there would be some high dimensional surface representing "all true things". Any fact could be trivially resolved as "true" or "false" simply by exploring whether or not it was represented on this surface. Where or not "My social security number is 123-45-6789" is true could be determined simply by checking whether or not that statement was mappable to the truth manifold. Likewise you could wander around that truth manifold and start generating output of all true things.

If such a thing existed it would make even the wildest fantasies about AGI seem tame.

edit: To simplify it further, this would imply you could have an 'is_true(statement: string): bool' function for any arbitrary statement in English.

mqus•38m ago

Well, no. The article pretty much says that any arbitrary statement can be mapped to {true, false, I don't know}. This is still not 100% accurate, but at least something that seems reachable. The model should just be able to tell unknowns, not be able to verify every single fact.

gary_0•19m ago

Determining a statement's truth (or if it's outside the system's knowledge) is an old problem in machine intelligence, with whole subfields like knowledge graphs and such, and it's NOT a problem LLMs were originally meant to address at all.

LLMs are text generators that are very good at writing a book report based on a prompt and the patterns learned from the training corpus, but it's an entirely separate problem to go through that book report statement by statement and determine if each one is true/false/unknown. And that problem is one that the AI field has already spent 60 years on, so there's a lot of hubris in assuming you can just solve that and bolt it onto the side of GPT-5 by next quarter.

cainxinth•2h ago

I find the leader board argument a little strange. All their enterprise clients are clamoring for more reliability from them. If they could train a model that conceded ignorance instead of guessing and thus avoid hallucinations, why aren't they doing that? Because of leader board optics?

ospray•2h ago

I think they are trying to communicate that their benchmarks will go down as they try to tackle hallucinations. Honestly I am surprised they didn't just say we think all benchmarks need a incorrect vs abstinence ratio so our cautious honest model can do well on that. Although they did seem to hint that's what they want.

jrm4•2h ago

Yeah, no, count me in with those who think that "All they do is hallucinate" is the correct way to say this and anything else dangerously obscures things.

More than anything, we need transparency on how these things work. For us and for the general public.

"Hallucination" introduces the dangerous idea that "them getting things wrong" is something like a "curable disease" and not "garbage in garbage out."

No. This is as stupid as saying Google telling me a restaurant is open when it's closed is a "hallucination." Stop personifying these things.

robertclaus•2h ago

While I get the academic perspective of sharing these insights, this article comes across as corporate justifying/complaining that their model's score is lower than it should be on the leaderboards... by saying the leaderboards are wrong.

Or an even darker take is that its coorporate saying they won't prioritize eliminating hallucinations until the leaderboards reward it.

skybrian•1h ago

Yes, it's self-interested because they want to improve the leaderboards, which will help GPT-5 scores, but in the other hand, the changes they suggest seem very reasonable and will hopefully help everyone in the industry do better.

And I'm sure other people will complain if notice that changing the benchmarks makes things worse.

nurettin•2h ago

We program them to fill in the blanks, and then sit there wondering why they did.

Classic humans.

sp1982•2h ago

This makes sense. I recently did an experiment to test GPT5 on hallucinations on cricket data where there is a lot of statistical pressure. It is far better to say idk than a wrong answer. Most current benchmarks don’t test for that. https://kaamvaam.com/machine-learning-ai/llm-eval-hallucinat...

amw-zero•2h ago

I love the euphemistic thinking. “We built something that legitimately doesn’t do the thing that we advertise, but when it doesn’t do it we shall deem that hallucination.”

didibus•1h ago

When tuning predictive models you always have to balance precision and recall because 100% accuracy is never going to happen.

In LLMs that balance shows up as how often the model hallucinates versus how often it says it doesn’t know. If you push toward precision you end up with a model that constantly refuses: What’s the X of Y? I don’t know. Can you implement a function that does K? I don’t know how. What could be the cause of G? I can’t say. As a user that gets old fast, you just want it to try, take a guess, let you be the judge of it.

Benchmarks and leaderboards usually lean toward recall because a model that always gives it a shot creates a better illusion of intelligence, even if some of those shots are wrong. That illusion keeps users engaged, which means more users and more money.

And that's why LLM hallucinates :P

Difwif•1h ago

It would be interesting to see two versions of a model. A primary model tuned for precision that's focused on correctness that works with or orchestrates a creative model that's tuned for generating new (and potentially incorrect) ideas. The primary model is responsible for evaluating and reasoning about the ideas/hallucinations. Feels like a left/right brain architecture (even though that's an antiquated model of human brain hemispheres).

the_af•1h ago

Some people here in the comments are arguing that the LLM "understands" what is "true" and "false", that is somewhat capable of reasoning, etc, but I still find it quite easy (with GPT-5) to break its facade of "reasoning".

I asked it to play a word game. This is very simple, and a very short session too. It failed in its very first response, and then it failed in explaining why it failed. All with total confidence, no hesitation.

Nobody fluent in English would fail so catastrophically. I actually expected it to succeed:

https://chatgpt.com/share/68bcb490-a5b4-8013-b2be-35d27962ad...

It's clear by this failure model the LLM doesn't understand anything.

Edit: to be clear, as the session goes longer it becomes more interesting, but you can still trip the LLM up in ways no human "understanding" the game would. My 6-year old plays this game better, because she truly understands... she can trip up, but not like this.

d4rkn0d3z•1h ago

This is a case of the metric becoming the target. The tools used to evaluate LLM performance are shaping the LLM. First you make your tools then your tools make you.

If we take a formal systems approach, then an LLM is a model of a complex hierarchy of production rules corresponding to the various formal and informal grammatical, logical, and stylistic rules and habits employed by humans to form language that expresses their intelligence. It should not be surprising that simply executing the production rules, or a model thereof, will give rise to sentences that cannot be assigned a meaning. It should also give rise to sentences that we cannot prove or make sense of immediately but we would not want to discard these due to uncertainty. Why? because every once in a while the sentence that would be culled is actually the stroke of brilliance we are looking for, uncertainty be damned. The citation here would be literally nearly every discovery ever made.

When I recall information and use it, when I "think", I don't just produce sentences by the rules, formal and informal, I don't consider at all how often I have seen one word precede another in past, rather as I meandre the landscape of a given context, a thought manifold if you will, I am constantly evaluating whether this is in contradiction with that, if this can be inferred from that via induction or deduction, does this preclude that, etc.. That is the part that is missing from an LLM; The uncanny ability of the human mind to reproduce the entire manifold of concepts as they relate to one another in a mesh from any small piece of the terrain that it might recall, and to verify anew that they all hang together unsupported by one's own biases.

The problem is that just as the scarcity of factual information in the corpus makes it difficult to produce, so is actual reasoning rarefied among human language samples. Most of what appears as reasoning is language games and will to power. The act of reasoning in an unbiased way is so foreign to humans, so painful and arduous, so much like bending over backwards or swimming upstream against a strong current of will to power, that almost nobody does it for long.

hankchinaski•1h ago

because they are glorified markov chains?

mqus•29m ago

I think one of the main problems is the dataset it is trained on, which is written text. How much answers with statements are in a given text, compared to a "I don't know"? I think the "I don't know"s are much less represented. Now go anywhere on the internet where someone asks a question (the typical kind of content LLMs are trained on) and the problem is even bigger. You either get no textual answer or someone that gives some answer (that might even be false). You never get an answer like "I don't know", especially for questions that are shouted into the void (compared to asking a certain person). And it makes sense. I wouldn't start to answer every stackoverflow question with "I don't know" tomorrow, it would just be spam.

For me, as a layman (with no experience at all about how this actually works), this seems to be the cause. Can we work around this? Maybe.

Utah's hottest new power source is 15k feet below the ground

How the "Kim" dump exposed North Korea's credential theft playbook

A Navajo weaving of an integrated circuit: the 555 timer

Shipping textures as PNGs is suboptimal

I'm Making a Beautiful, Aesthetic and Open-Source Platform for Learning Japanese

C++26: Erroneous Behaviour

Troubleshooting ZFS – Common Issues and How to Fix Them

A history of metaphorical brain talk in psychiatry

Over 80% of Sunscreen Performed Below Their Labelled Efficacy (2020)

Qwen3 30B A3B Hits 13 token/s on 4xRaspberry Pi 5

We hacked Burger King: How auth bypass led to drive-thru audio surveillance

The maths you need to start understanding LLMs

Oldest recorded transaction

What to Do with an Old iPad

Anonymous recursive functions in Racket

Stop writing CLI validation. Parse it right the first time

Using Claude Code SDK to reduce E2E test time

Matmul on Blackwell: Part 2 – Using Hardware Features to Optimize Matmul

GigaByte CXL memory expansion card with up to 512GB DRAM

Microsoft Azure: "Multiple international subsea cables were cut in the Red Sea"

Why language models hallucinate

Processing Piano Tutorial Videos in the Browser

Gloria funicular derailment initial findings report (EN) [pdf]

AI surveillance should be banned while there is still time

Baby's first type checker

Qantas is cutting executive bonuses after data breach

William James at CERN (1995)

Rug pulls, forks, and open-source feudalism

Rust tool for generating random fractals

Europe enters the exascale supercomputing league with Jupiter

Utah's hottest new power source is 15k feet below the ground

How the "Kim" dump exposed North Korea's credential theft playbook

A Navajo weaving of an integrated circuit: the 555 timer

Shipping textures as PNGs is suboptimal

I'm Making a Beautiful, Aesthetic and Open-Source Platform for Learning Japanese

C++26: Erroneous Behaviour

Troubleshooting ZFS – Common Issues and How to Fix Them

A history of metaphorical brain talk in psychiatry

Over 80% of Sunscreen Performed Below Their Labelled Efficacy (2020)

Qwen3 30B A3B Hits 13 token/s on 4xRaspberry Pi 5

We hacked Burger King: How auth bypass led to drive-thru audio surveillance

The maths you need to start understanding LLMs

Oldest recorded transaction

What to Do with an Old iPad

Anonymous recursive functions in Racket

Stop writing CLI validation. Parse it right the first time

Using Claude Code SDK to reduce E2E test time

Matmul on Blackwell: Part 2 – Using Hardware Features to Optimize Matmul

GigaByte CXL memory expansion card with up to 512GB DRAM

Microsoft Azure: "Multiple international subsea cables were cut in the Red Sea"

Why language models hallucinate

Processing Piano Tutorial Videos in the Browser

Gloria funicular derailment initial findings report (EN) [pdf]

AI surveillance should be banned while there is still time

Baby's first type checker

Qantas is cutting executive bonuses after data breach

William James at CERN (1995)

Rug pulls, forks, and open-source feudalism

Rust tool for generating random fractals

Europe enters the exascale supercomputing league with Jupiter

Why language models hallucinate

Comments