'Probably' doesn't mean the same thing to your AI as it does to you

https://theconversation.com/probably-doesnt-mean-the-same-thing-to-your-ai-as-it-does-to-you-275626

7•colinprince•2h ago

Comments

OkayPhysicist•1h ago

I wonder if the 70% vs 80% "Probably" problem comes from cultural differences between anglophone countries. The human datasets that were available were mostly American, with some Western Europe/NATO. Notably missing would be India, which simply by population I'd expect to represent a significant chunk of English-language writing available on the open internet ( and thus fed into LLM training sets).

The other phenomena I would love to test is if the act of surveying people effected their declared odds. Not sure how to get good numbers out of that, but I could see the LLM vs surveyed human discrepancy arising from people using "probably" differently in their everyday writing, as opposed to when asked point-blank what "probably" means.

5o1ecist•1h ago

> The research focused on words of estimative probability, which include terms like “maybe,” “probably” and “almost certain.”

Interesting. Perplexity did that as well, but I've made sure it stops doing that.

Might be relevant for others: https://www.perplexity.ai/search/hey-hey-do-you-remember-whe...

selridge•1h ago

Alignment is impossible here. “Nearly certain” odds for success for a sports team might be 20:1, but that’s a little worse (not much!) than for a launch vehicle and not at all good for a web server. No one would say “it is nearly certain that I’ll serve a web request” based on two 9’s, but they would say “it is nearly certain the team will win today” given the same odds. That’s just between humans.

rcarr•1h ago

Something I noticed recently is that Claude Code interprets "or" as inclusive or (or at least it does when writing function names). I suspect that this must be due to it's code specific nature considering I would expect the majority of or use in written language to be exclusive or.

jadenPete•30m ago

It seems like this problem (differences in how humans and LLMs use probabilistic language) and hallucination are one in the same. LLMs don’t have access to information about how confident they are, so they always choose the most likely response, even if the most likely response isn’t actually that likely. Whereas if a human is unconfident, they’ll express that instead of choosing the most likely response.

Of course, LLMs can still speak about probabilities and mimic uncertainty, but that’s likely (heh) coming from their training data on the subject matter, not their actual confidence.

Humans are interesting because they employ a two-phased approach: when we’re learning, we fake confidence (you’d never write “I don’t know” on a test unless you truly had nothing of value to say), but during inference, we communicate our confidence. Some humans suffer from underconfidence or overconfidence, but most just seem to know innately how to do this.

Can anyone who works on LLMs clarify whether my understanding is correct?

Managing Complexity with Mycelium

How Did Japan's Space Program Evolve?

The Agent-Ready Codebase

Apple Rolls Out Age Verification to UK iPhone Users Under Online Safety Act

The 2026 Global Intelligence Crisis

Show HN: Deff – Review AI-generated code changes

Sparky – useful 'living' OpenClaw bot

What Happened to Molecular Manufacturing?

Specification; communication; computation – no, programming isn't dead

Larry Page has moved to Florida

Apple brings age verification to UK users in iOS 26.4 beta

Possible AI use leads to end of senryu competition after 20 years

Show HN: Clerk – Simple invoicing for freelancers built with AI agents in 7 days

Why Your Next Electric Car Will Cost 50% Less [video]

Show HN: Provision Stateless GPU Compute with Claude Code's Remote Control

Show HN: Edictum – Runtime governance for LLM agent tool calls

Outage of Coveralls

Getting Global Age Assurance Right: What We Got Wrong and What's Changing

Tldraw moves tests to closed source to avoid AI copies

Tech firms aren't just encouraging their workers to use AI. They're enforcing it

The first transatlantic fiber-optic cable is being ripped up

Live – AI that runs your company

Fix cron routes: POST → GET (Vercel cron sends GET)

Show HN: OrangeWalrus, an aggregator for trivia nights (and other events) in SF

Banned in California

What AI can and cannot do

Tetraethylenepentamine-Grafted Magnetic Polymer Composite for CO2 Capture

Anthropic and the Department of War

Show HN: Unworldly – A flight recorder for AI agents (tamper-proof, HIPAA)

Buying News by Metric