AI Hallucination Cases Database

https://www.damiencharlotin.com/hallucinations/

52•Tomte•3h ago

Comments

irrational•3h ago

I still think confabulation is a better term for what LLMs do than hallucination.

Hallucination - A hallucination is a false perception where a person senses something that isn't actually there, affecting any of the five senses: sight, sound, smell, touch, or taste. These experiences can seem very real to the person experiencing them, even though they are not based on external stimuli.

Confabulation - Confabulation is a memory error consisting of the production of fabricated, distorted, or misinterpreted memories about oneself or the world. It is generally associated with certain types of brain damage or a specific subset of dementias.

bluefirebrand•3h ago

You're not wrong in a strict sense, but you have to remember that most people aren't that strict about language

I would bet that for most people they define the words like:

Hallucination - something that isn't real

Confabulation - a word that they have never heard of

static_void•2h ago

We should not bend over backwards to use language the way ignorant people do.

add-sub-mul-div•2h ago

"Bending over backwards" is a pretty ignorant metaphor for this situation, it describes explicit activity whereas letting people use metaphor loosely only requires passivity.

furyofantares•2h ago

I like communicating with people using a shared understanding of the words being used, even if I have an additional, different understanding of the words, which I can use with other people.

That's what words are, anyway.

dingnuts•1h ago

I like calling it bullshit[0] because it's the most accurate, most understandable, and the most fun to use with a footnote

0 (featured previously on HN) https://link.springer.com/article/10.1007/s10676-024-09775-5

rad_gruchalski•15m ago

Ignorance is easy to hide behind many words.

AllegedAlec•32m ago

We should not bend over backwards to use language the way anally retentive people demand we do.

rad_gruchalski•16m ago

Ignorance clusters easily. You’ll have no problem finding alike.

vkou•14m ago

> Ignorance clusters easily.

So does pedantry and prickliness.

Intelligence is knowing that a tomato is a fruit, wisdom is not putting it in a fruit salad. It's fine to want to do your part to steer language, but this is not one of those cases where it's important enough for anyone to be an asshole about it.

rad_gruchalski•11m ago

It also becomes apparent that ignorance leads to a weird aggressive asshole fetish.

resonious•8m ago

I would go one step further and suppose that a lot of people just don't know what confabulation means.

maxbond•2h ago

I think "apophenia" (attributing meaning to spurious connections) or "pareidolia" (the form of aphonenia where we see faces where there are none) would have been good choices, as well.

cratermoon•2h ago

anthropoglossic systems.

Terr_•2h ago

Largely Logorrhea Models.

rollcat•2h ago

There's a simpler word for that: lying.

It's also equally wrong. Lying implies intent. Stop anthropomorphising language models.

sorcerer-mar•1h ago

Lying is different from confabulation. As you say, lying implies intent. Confabulation does not necessarily, ergo it's a far better word than either lying or hallucinating.

A person with dementia confabulates a lot, which entails describing reality "incorrectly";, but it's not quite fair to describe it as lying.

bandrami•31m ago

A liar seeks to hide the truth; a confabulator is indifferent to the truth entirely. It's an important distinction. True statements can still be confabulations.

matkoniecz•2h ago

And why confabulation is better one of those?

bee_rider•2h ago

It seems like these are all anthropomorphic euphemisms for things that would otherwise be described as bugs, errors (in the “broken program” sense), or error (in the “accumulation of numerical error” sense), if LLMs didn’t have the easy-to-anthropomorphize chat interface.

diggan•1h ago

Imagine you have function that is called "is_true" but it only gets it right 60% of the time. We're doing this within CS/ML, so lets call that "correctness" or something fancier. In order for that function to be valuable, would we need to hit a 100% in correctness? I mean probably most of the time, yeah. But sometimes, maybe even rarely, we're fine with it being less than 100%, but still as high as possible.

So in this point of view, it's not a bug or error that it currently sits at 60%, but if we manage to find a way to hit 70%, it would be better. But in order to figure this out, we need to call this "correct for most part, but could be better" concept something. So we look at what we already know and are familiar with, and try to draw parallels, maybe even borrow some names/words.

bee_rider•1h ago

This doesn’t seem too different from my third thing, error (in the “accumulation of numerical error” sense).

timewizard•1h ago

> but if we manage to find a way to hit 70%, it would be better.

Yet still absolutely worthless.

> "correct for most part, but could be better" concept something.

When humans do that we just call it "an error."

> so lets call that "correctness" or something

The appropriate term is "confidence." These LLM tools all could give you a confidence rating with each and every "fact" it attempts to relay to you. Of course they don't actually do that because no one would use a tool that confidently gives you answers based on a 70% self confidence rating.

We can quibble over terms but more appropriately this is just "garbage." It's a giant waste of energy and resources that produces flawed results. All of that money and effort could be better used elsewhere.

vrighter•1h ago

and even those confidence ratings are useless, imo. If trained with wrong data, it will report high confidence for the wrong answer. And curating a dataset is a black art in the first place

georgemcbay•44m ago

They aren't really bugs though in the traditional sense because all LLMs ever do is "hallucinate", seeing what we call a hallucination as something fundamentally different than what we consider a correct response is further anthropomorphising the LLM.

We just label it with that word when it statistically generates something we know to be wrong, but functionally what it did in that case is no different than when it statistically generated something that we know to be correct.

anshumankmr•3h ago

Can we submit ChatGPT convo histories??

Flemlo•3h ago

So what's the amount of cases were it was wrong but no one checked?

add-sub-mul-div•2h ago

Good point. People putting the least amount of effort into their job that they can get away with is universal, judges are no more immune to it than lawyers.

mullingitover•37m ago

This seems like a perfect use case for a legal MCP server that can provide grounding for citations. Protomated already has one[1].

[1] https://github.com/protomated/legal-context

Ask HN: How do you plan, estimate, and delegate engineering work?

XTide86 a tmux and nvim powered terminal IDE

My breakthrough in photorealistic person-specific AI image generation

Lou Montulli: The Man Who Invented the Cookie

Tiberius Aerospace unveils Sceptre; a 150 km 155 mm round

Show HN: Blox Fruits Catalog – A Trading Hub for Roblox Players

Initial support for calling Mojo from Python

How to debug large, distributed systems: Antithesis (2024)

A curated list for "Hardcore Software"

Show HN: Convert JPG, PNG, WebP to AVIF – Free Web Tool

Why Silicon Valley's Most Powerful People Are So Obsessed with Hobbits

Show HN: Octelium – L7-Aware ZeroTrust Remote Access ZTNA over WireGuard and K8s

Noah's Mausoleum (Nakhchivan, Azerbaijan)

Ask HN: What are you working on? (May 2025)

Find the right movie to watch using Amphytheatre

Without Roots: The Political Consequences of Collective Economic Shocks

What if you got a device that cured ADHD like Modafinil with o side-effects?

FreeBSD: The Report of My Death Was an Exaggeration

Scheming Reasoning Evaluations

Beware the Complexity Merchants

Microsoft will stop accepting new third-party print drivers in Windows (2023)

Deep Dive: How JPMorgan Is Reengineering Banking at Scale

Judges Weigh Taking Control of Their Own Security Amid Threats

Why People Quit – and How Great Managers Make Them Want to Stay

Idea: A competitive Tetris variant using a single grid

Vibe Coding for Domain Experts

Show HN: Generate SVGs with AI

Integrated Combined Cycle Turbine-Ramjet-Scramjet-Rocket Engine

Stop Looking for Mentors(2021)

Gemma 3n Architectural Innovations – Speculation and poking around in the model