The Nature of Hallucinations

https://blog.qaware.de/posts/nature-of-hallucinations/

15•baquero•4mo ago

Comments

baquero•4mo ago

Why do language models sometimes just make things up? We’ve all experienced it: you ask a question, get a confident-sounding answer—and it’s wrong, but it sounds convincing. Even when you know the answer is false, the model insists on it. To this day, this problem can be reduced, but not eliminated.

partomniscient•4mo ago

Title should be amended to "Nature of AI Hallucinations".

The first line "Why do language models sometimes just make things up?" was not what I was expecting to read about.

add-sub-mul-div•4mo ago

It's probably futile by now to fight that "hallucination" and "slop" have become synonyms for AI output and the AI context will be their most common or default use going forward.

Regardless of whether those terms in the AI context correlate perfectly to their original meanings.

Uehreka•4mo ago

I remember super clearly the first time an LLM told me “No.” It was in May when I was using Copilot in VS Code and switched from Claude 3.7 Sonnet to Claude Sonnet 4. I asked Sonnet 4 to do something 3.7 Sonnet had been struggling with (something involving the FasterLivePortrait project in Python) and it told me that what I was asking for was not possible and explained why.

I get that this is different from getting an LLM to admit that it doesn’t know something, but I thought “getting a coding agent to stop spinning its wheels when set to an impossible task” was months or years away, and then suddenly it was here.

I haven’t yet read a good explanation of why Claude 4 is so much better at this kind of thing, and it definitely goes against what most people say about how LLMs are supposed to work (which is a large part of why I’ve been telling people to stop leaning on mechanical explanations of LLM behavior/strengths/weaknesses). However it was definitely a step-function improvement.

cainxinth•4mo ago

Yet LLMs also sometimes erroneously claim they cannot do something they can.

s-macke•4mo ago

Like they learn facts by heart, they learn what they can’t by heart as well.

Ask them to solve one of the Millennium Prize Problems. They’ll say they can’t do it, but that 'No' is just memorized. There’s nothing behind it.

Panzerschrek•4mo ago

I find the term "hallucination" very misleading. What LLMs produce means really "lie" or "misinformation". The term "hallucination" is so common nowadays only because corporations developing LLMs prefer using it rather than saying the truth, that their models are just huge machines for making things up. I am still wondering, why there are no legal consequences for authors of these LLMs because of that.

leobg•4mo ago

“Confabulation” is the better term imho (literally: making things up). But I guess OpenAI et al stuck to “hallucination” because it generalizes across text, audio and image generation.

s-macke•4mo ago

Author here. The discussion about this wording is actually the opening section of the article.

> Unfortunately, the term hallucination quickly stuck to this phenomenon — before any psychologist could object.

vrighter•4mo ago

There's no such thing as "llm hallucinations". For there to be there has to be an objective, rigorous way to distinguish them from non-hallucinations. Which doesn't exist. They walk like the "good" output, they quack like the "good" output, they are indistinguishable from the "good" output.

The only difference between the two is whether a human likes it. If the human doesn't like it, then it's a hallucination. If the human doesn't know it's wrong, then it's not a hallucination (as far as that user is concerned).

The term "hallucination" is just marketing BS. In any other case it'd be called "broken shit".

The term hallucination is used as if the network is somehow giving the wrong output. It's not. It's giving a probability distribution for the next token. Exactly what it was designed for. The misunderstanding is what the user thinks they are asking. They think they are asking for a correct answer, but they are instead asking for a plausible answer. Very different things. An LLM is designed to give plausible, not correct answers. And when a user asks for a plausible, but not necessarily correct, answer (whether or not they realize it) and they get a plausible but not necessarily correct answer, then the LLM is working exactly as intended.

s-macke•4mo ago

Author here. You’ve just summarized the main part of the article. To keep things simple, the focus is on pure facts. But yes, the outcome of next token prediction is much more profound than wrong facts.

Reverse Engineering Raiders of the Lost Ark for the Atari 2600

The AI4Agile Practitioners Report 2026

Digital Independence Day

What a bot hacking attempt looks like: SQL injections galore

Show HN: FlashMesh – An encrypted file mesh across Google Drive and Dropbox

Show HN: AgentLens – Open-source observability and audit trail for AI agents

Show HN: ShipClaw – Deploy OpenClaw to the Cloud in One Click

Unlock the Power of Real-Time Google Trends Visit: Www.daily-Trending.org

Explanation of British Class System

Show HN: Jwtpeek – minimal, user-friendly JWT inspector in Go

Willow – Protocols for an uncertain future [video]

Feedback on a client-side, privacy-first PDF editor I built

Clay Christensen's Milkshake Marketing (2011)

Show HN: WeaveMind – AI Workflows with human-in-the-loop

Show HN: Seedream 5.0: free AI image generator that claims strong text rendering

A contributor trust management system based on explicit vouches

Show HN: Analyzing 9 years of HN side projects that reached $500/month

The Floating Dock for Developers

Arcan Explained – A browser for different webs

We are not scared of AI, we are scared of irrelevance

Quartz Crystals

Show HN: I built a free dictionary API to avoid API keys

Show HN: Kybera – Agentic Smart Wallet with AI Osint and Reputation Tracking

Show HN: brew changelog – find upstream changelogs for Homebrew packages

Any chess position with 8 pieces on board and one pair of pawns has been solved

LLMs as Language Compilers: Lessons from Fortran for the Future of Coding

Projecting high-dimensional tensor/matrix/vect GPT–>ML

Show HN: Free Bank Statement Analyzer to Find Spending Leaks and Save Money

Our Stolen Light

Matchlock: Linux-based sandboxing for AI agents

Reverse Engineering Raiders of the Lost Ark for the Atari 2600

The AI4Agile Practitioners Report 2026

Digital Independence Day

What a bot hacking attempt looks like: SQL injections galore

Show HN: FlashMesh – An encrypted file mesh across Google Drive and Dropbox

Show HN: AgentLens – Open-source observability and audit trail for AI agents

Show HN: ShipClaw – Deploy OpenClaw to the Cloud in One Click

Unlock the Power of Real-Time Google Trends Visit: Www.daily-Trending.org

Explanation of British Class System

Show HN: Jwtpeek – minimal, user-friendly JWT inspector in Go

Willow – Protocols for an uncertain future [video]

Feedback on a client-side, privacy-first PDF editor I built

Clay Christensen's Milkshake Marketing (2011)

Show HN: WeaveMind – AI Workflows with human-in-the-loop

Show HN: Seedream 5.0: free AI image generator that claims strong text rendering

A contributor trust management system based on explicit vouches

Show HN: Analyzing 9 years of HN side projects that reached $500/month

The Floating Dock for Developers

Arcan Explained – A browser for different webs

We are not scared of AI, we are scared of irrelevance

Quartz Crystals

Show HN: I built a free dictionary API to avoid API keys

Show HN: Kybera – Agentic Smart Wallet with AI Osint and Reputation Tracking

Show HN: brew changelog – find upstream changelogs for Homebrew packages

Any chess position with 8 pieces on board and one pair of pawns has been solved

LLMs as Language Compilers: Lessons from Fortran for the Future of Coding

Projecting high-dimensional tensor/matrix/vect GPT–>ML

Show HN: Free Bank Statement Analyzer to Find Spending Leaks and Save Money

Our Stolen Light

Matchlock: Linux-based sandboxing for AI agents

The Nature of Hallucinations

Comments