AI hallucinations are getting worse – and they're here to stay

https://www.newscientist.com/article/2479545-ai-hallucinations-are-getting-worse-and-theyre-here-to-stay/

10•greyadept•3h ago

Comments

allears•2h ago

Of course they're here to stay. LLMs aren't designed to tell the truth, or to be able to separate fact from fiction. How could they, given that their training data includes both, and there's no "understanding" there in the first place? Naturally, the most straightforward solution is to redefine "intelligence" and "truth," and they're working on that.

etaioinshrdlu•1h ago

The creators are definitely trying to make them tell the truth. They optimize for benchmarks where truthful answering gets a higher score. All the big LLM vendors now have APIs that can ground their answers in search results.

Just because it's a hard unsolved problem, I don't understand the impulse to assert the AI industry is on a war with truth!

kazinator•1h ago

Even if training data contains nothing but truths, you cannot always numerically interpolate among truths.

kazinator•1h ago

> But ["hallucination"] can also refer to an AI-generated answer that is factually accurate, but not actually relevant to the question it was asked, or fails to follow instructions in some other way.

No, "hallucination" can't refer to that. That's a non sequitur or non-compliance and such.

Hallucination is quite specific, referring to making statements which can be interpreted as referring to the circumstances of a world which doesn't exist. Those statements are often relevant; the response would be useful if that world did coincide with the real one.

If your claim is that hallucinations are getting worse, you have to measure the incidences of just those kinds of outputs, treating other forms of irrelevance as a separate category.

metalman•50m ago

AI is becoming that problematic tenant in a building, who presented well, and had great references, but is now bumming money from everbody, stealing peoples mail and reading before putting it back,cant pat there power bill, and wanders around talking to squirls We should build some sort of half way house, where the AI's can get therapy and some one to keep them on there meds, and do the group living thing till they, maybe, can join society. The last thing we need is some sort of turbo charged A+List psycho beaming itself into everybodys lives, but hey whatever! right!, people got to do what people got to do, and part of that is shrugging off all the hype and noise. I just keep doubling down on reality, it seems to come naturaly :)

roskelld•9m ago

I had an interesting one yesterday where I was building out some code on the Unreal engine and I gave o4-mini-high links to the documentation, a class header, and a blog with an example project.

I asked it to create some boilerplate and it presented me with a class function that I knew did not exist; though like many hallucinations it would have been very beneficial if it did.

So, instead of just pointing out that it didn't exist and getting the usual "Oh you're right, that function does not exist so use this function instead", I asked it why it gave me that function given that it has access to the header and an example project. It doubled down and stated that the function was in the header and the example project, even presenting a code sample it claimed was from the example project with the fake function.

It felt like a step up from the confidently incorrect state I'd seen before to a level where if it weren't for the fact that I'm knowledgeable enough about the class in question (or my ability to be able to check) then I'd possibly start questioning myself.

Smartphone Sensors and Antihydrogen Aould Soon Put Relativity to the Test

Florida bill requiring encryption backdoors for social media accounts has failed

Anubis and Caddy-Docker-Proxy

Albert Ellis: Stoicism as the Root of CBT (2023)

Engineering principles in the age of vibe coding

Yahoo Mail vs. Gmail: Which should you use?

GraphQL vs. REST API: Which Is a Natural Fit for Graph Databases?

Gen Z's 'conscious unbossing' should be a wake-up call for businesses

AI agents in B2B sales: pre‑built tools vs. custom solutions

Trump admin to roll back Biden's AI chip restrictions

OpenAI admits it screwed up testing its 'sycophant-y' ChatGPT update

A mathematical proof assistant (v2)

Why travel didn't bring the world together

We've submitted Fortnite to Apple for review

Not Recommended: Why Current Content Recommendation Systems Fail Us

Ask HN: Which function definition keyword do you prefer, def or fn?

Xkcd's "Is It Worth the Time?" Considered Harmful

Apple reportedly readies Baltra processors for AI servers

Galactic Coordinate System

The Grug Brained Developer

Fine-tuned acoustic waves can knock drones out of the sky

Sidebar Calendar – Your Schedule at a Glance

Legal actions in Brazilian air transport: a ML/logistic regression analysis

Orders for Pahalgam satellite images from US firm peaked 2 months before attack

Simon Willison's first blog on LLMs (2022)

Show HN: No as a Service Rust

Ursula K. Le Guin on the TV Earthsea. (2004)

GNU Taler 1.0 Released

why vi rocks

How Bail Bonds Work