frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

AI hallucinations are getting worse – and they're here to stay

https://www.newscientist.com/article/2479545-ai-hallucinations-are-getting-worse-and-theyre-here-to-stay/
10•greyadept•3h ago

Comments

allears•2h ago
Of course they're here to stay. LLMs aren't designed to tell the truth, or to be able to separate fact from fiction. How could they, given that their training data includes both, and there's no "understanding" there in the first place? Naturally, the most straightforward solution is to redefine "intelligence" and "truth," and they're working on that.
etaioinshrdlu•1h ago
The creators are definitely trying to make them tell the truth. They optimize for benchmarks where truthful answering gets a higher score. All the big LLM vendors now have APIs that can ground their answers in search results.

Just because it's a hard unsolved problem, I don't understand the impulse to assert the AI industry is on a war with truth!

kazinator•1h ago
Even if training data contains nothing but truths, you cannot always numerically interpolate among truths.
kazinator•1h ago
> But ["hallucination"] can also refer to an AI-generated answer that is factually accurate, but not actually relevant to the question it was asked, or fails to follow instructions in some other way.

No, "hallucination" can't refer to that. That's a non sequitur or non-compliance and such.

Hallucination is quite specific, referring to making statements which can be interpreted as referring to the circumstances of a world which doesn't exist. Those statements are often relevant; the response would be useful if that world did coincide with the real one.

If your claim is that hallucinations are getting worse, you have to measure the incidences of just those kinds of outputs, treating other forms of irrelevance as a separate category.

metalman•50m ago
AI is becoming that problematic tenant in a building, who presented well, and had great references, but is now bumming money from everbody, stealing peoples mail and reading before putting it back,cant pat there power bill, and wanders around talking to squirls We should build some sort of half way house, where the AI's can get therapy and some one to keep them on there meds, and do the group living thing till they, maybe, can join society. The last thing we need is some sort of turbo charged A+List psycho beaming itself into everybodys lives, but hey whatever! right!, people got to do what people got to do, and part of that is shrugging off all the hype and noise. I just keep doubling down on reality, it seems to come naturaly :)
roskelld•9m ago
I had an interesting one yesterday where I was building out some code on the Unreal engine and I gave o4-mini-high links to the documentation, a class header, and a blog with an example project.

I asked it to create some boilerplate and it presented me with a class function that I knew did not exist; though like many hallucinations it would have been very beneficial if it did.

So, instead of just pointing out that it didn't exist and getting the usual "Oh you're right, that function does not exist so use this function instead", I asked it why it gave me that function given that it has access to the header and an example project. It doubled down and stated that the function was in the header and the example project, even presenting a code sample it claimed was from the example project with the fake function.

It felt like a step up from the confidently incorrect state I'd seen before to a level where if it weren't for the fact that I'm knowledgeable enough about the class in question (or my ability to be able to check) then I'd possibly start questioning myself.

Smartphone Sensors and Antihydrogen Aould Soon Put Relativity to the Test

https://physicsworld.com/a/smartphone-sensors-and-antihydrogen-could-soon-put-relativity-to-the-test/
1•EA-3167•1m ago•0 comments

Florida bill requiring encryption backdoors for social media accounts has failed

https://techcrunch.com/2025/05/09/florida-bill-requiring-encryption-backdoors-for-social-media-accounts-has-failed/
1•chrisjj•2m ago•0 comments

Anubis and Caddy-Docker-Proxy

https://patdavid.net/2025/05/anubis-and-caddy-docker-proxy/
1•xena•6m ago•0 comments

Albert Ellis: Stoicism as the Root of CBT (2023)

https://thewalledgarden.com/albert-ellis-stoicism-as-the-root-of-cbt/
1•mellosouls•7m ago•0 comments

Engineering principles in the age of vibe coding

1•denieler•8m ago•0 comments

Yahoo Mail vs. Gmail: Which should you use?

https://zapier.com/blog/yahoo-vs-gmail/
2•mooreds•16m ago•0 comments

GraphQL vs. REST API: Which Is a Natural Fit for Graph Databases?

https://memgraph.com/blog/graphql-vs-rest-api
3•sareada52•23m ago•0 comments

Gen Z's 'conscious unbossing' should be a wake-up call for businesses

https://www.businessinsider.com/gen-z-consciously-unbossing-avoid-management-roles-preserve-mental-health-2025-4
4•rntn•23m ago•0 comments

AI agents in B2B sales: pre‑built tools vs. custom solutions

https://www.yougotus.ai/ai-agents-in-b2b-sales
1•Bittermann•24m ago•0 comments

Trump admin to roll back Biden's AI chip restrictions

https://arstechnica.com/ai/2025/05/trump-admin-to-roll-back-bidens-ai-chip-restrictions/
4•byte-bolter•29m ago•1 comments

OpenAI admits it screwed up testing its 'sycophant-y' ChatGPT update

https://www.theverge.com/news/661422/openai-chatgpt-sycophancy-update-what-went-wrong
3•bit_qntum•30m ago•0 comments

A mathematical proof assistant (v2)

https://github.com/teorth/estimates
3•ptrj_•31m ago•0 comments

Why travel didn't bring the world together

https://www.ft.com/content/33e907bd-d6a9-43a2-9d96-c7709fea3a47
1•rwc9•32m ago•0 comments

We've submitted Fortnite to Apple for review

https://twitter.com/Fortnite/status/1920878504284975585
1•ushakov•35m ago•0 comments

Not Recommended: Why Current Content Recommendation Systems Fail Us

https://www.gojiberries.io/not-recommended-why-current-content-recommendation-systems-fail-us/
1•goji_berries•38m ago•1 comments

Ask HN: Which function definition keyword do you prefer, def or fn?

1•winwang•42m ago•1 comments

Xkcd's "Is It Worth the Time?" Considered Harmful

https://will-keleher.com/posts/its-not-worth-the-time-yet.html
6•gcmeplz•42m ago•1 comments

Apple reportedly readies Baltra processors for AI servers

https://www.tomshardware.com/pc-components/cpus/apple-reportedly-readies-baltra-processors-for-ai-servers
2•giuliomagnifico•43m ago•0 comments

Galactic Coordinate System

https://en.wikipedia.org/wiki/Galactic_coordinate_system
2•olddustytrail•44m ago•0 comments

The Grug Brained Developer

https://grugbrain.dev/
3•vkaku•45m ago•0 comments

Fine-tuned acoustic waves can knock drones out of the sky

https://www.economist.com/science-and-technology/2025/02/05/fine-tuned-acoustic-waves-can-knock-drones-out-of-the-sky
4•m1guelpf•46m ago•0 comments

Sidebar Calendar – Your Schedule at a Glance

https://apps.apple.com/us/app/sidebar-calendar/id6744621424?mt=12
1•gabecatalfo•48m ago•1 comments

Legal actions in Brazilian air transport: a ML/logistic regression analysis

https://www.frontiersin.org/journals/future-transportation/articles/10.3389/ffutr.2023.1070533/full
1•felineflock•48m ago•0 comments

Orders for Pahalgam satellite images from US firm peaked 2 months before attack

https://theprint.in/defence/pahalgam-satellite-image-us-space-tech-firm-maxar-technologies/2620666/
1•rainhacker•49m ago•0 comments

Simon Willison's first blog on LLMs (2022)

https://simonwillison.net/2022/May/31/a-datasette-tutorial-written-by-gpt-3/
14•Alifatisk•50m ago•0 comments

Show HN: No as a Service Rust

https://github.com/ZAZPRO/no-as-a-service-rust
2•ZAZPRO•51m ago•0 comments

Ursula K. Le Guin on the TV Earthsea. (2004)

https://slate.com/culture/2004/12/ursula-k-le-guin-on-the-tv-earthsea.html
2•Tomte•52m ago•0 comments

GNU Taler 1.0 Released

https://www.taler.net/en/news/2025-01.html
6•midzer•52m ago•0 comments

why vi rocks

https://why-vi.rocks/
2•exvi•54m ago•0 comments

How Bail Bonds Work

https://finbarr.site/2025/05/10/how-bail-bonds-work.html
2•Finbarr•55m ago•1 comments