The problem with LLMs isn't hallucination, it's context specific confidence

https://www.signalfire.com/blog/llm-hallucinations-arent-bugs

4•kerwioru9238492•2h ago

Comments

zviugfd•2h ago

It feels like most safety work is turning LLMs into overly cautious assistants and I like how this points out that we could be trading away imagination for a false sense of reliability.

alganet•1h ago

> Humans don’t get rewarded for saying “I don’t know” to every question, because that’s not useful.

Humans get rewarded for thinking "I don't know", a lot. That's why it's hard to compare.

> A model that always bluffs

A model doesn't bluff. It feels to us humans that they bluff, but there is no bluff mechanics in play. The model doesn't assess the prompter's ability to call their bluff. It's not hiding that it doesn't know something. It's just not reached a predictable point in a sequence of token predictions that can or not have something that resembles a call to what resembles a bluff.

Up to the point it's corrected, the model's representation of what was asked is the best it can do. It has no means to judge itself. Which leads to...

> The real issue isn’t that models make things up; it’s that they don’t clearly signal how confident they are when they do.

Which sounds like exactly what I said, but it's not. Signaling confidence is just a more convincing faux-bluff. Signaling is a side-effect of bluffing, a symptom, not the real thing (which is more related to asessing whoever is on the other side of the conversation).

> Imagining things, seeing problems from the wrong angle, and even fabricating explanations are the seeds of creativity.

I agree with this. However, Newton was not bluffing, he was right and confident about it, and right about being confident about it. It just turns out that his description was of a lesser knowledge resolution than Einsten's.

For this to work, we need lots of "connective tissue" ideas. Roads we can explore freely without being called liars. Things we can say without saying that these things are true or false, without the need for being confident or right, without being assessed directly. This is outside the realm of bluffing or saying useful things. It's quite the opposite.

When people saw comets and described them as dragons in the sky, they were not hallucinating or telling lies, they were preserving some connective tissue idea the best they could, outside of the realm of being right or wrong. This were not bluffs. There were some "truths" about their mistakes, or something useful (they were unadvertedly recording astronomical data, before astronomy existed). Those humans felt that was important, those stories stuck. Can we say the same thing about LLM hallucinations? I don't think we're ready to answer that.

So, yes. Hallucinations could be a feature, but there's a lot missing here.

_wire_•1h ago

"The problem with Magic 8-ball is lack of context specific confidence in its answers"

This article and attendant comments reveal the AI sector is turning to co-dependent excuse making for a technology that clearly can't live up to its hype.

Get ready for phrenology of AI...

"I am going to need to visit your data center to lay hands on the subject."

The Dreamseeker's Vision of Tomorrow

Holistic Agent Leaderboard: The Missing Infrastructure for AI Agent Evaluation

I Learned to Spot Inflated Bids (and What You Can Do Too)

What people write in Boss's Day cards

The easiest affiliate revenue lever no one talks about

ISPs angry about California law that lets renters opt out of forced payments

ChatGPT in a robot does what experts warned [video]

Agent Prism: React components for visualizing traces from AI agents

PyTorch 2.9 released with C ABI and better multi-GPU support

X executive says creator monetization program may potentially be ended

Locality, and Temporal-Spatial Hypothesis

A letter received from a plane crash fatality

Is tipping getting out of control? Many consumers say yes

Americans Need to Be Richer Than Ever to Buy Their First Home

Sandspiel

Windows 11 Cumulative Update 2025-10 breaks localhost applications

The price of gold is skyrocketing. Why is this, and will it continue

Transformers for Software Engineers

The Aspect.build CLI now in Rust

65% of Americans support monthly $2k Covid stimulus payments

British social media star 'Big John' detained in Australia over visa

Drew Struzan – March 18, 1947 – October 13, 2025

Experience: I own the world's largest Monopoly collection

Cifar-10 Speedrun Record Broken by Research Agent

components.build: OS standard modern, composable and accessible UI components

UI = Fn(state) Done Right

China Accessed Classified UK Systems for a Decade, Officials Say

Gravity Can Explain the Collapse of the Wavefunction (Sabine Hossenfelder)

Ask HN: Messed up and can't catch up

Android 'Pixnapping' attack can capture app data like 2FA codes