frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Why Language Models Hallucinate

https://www.arxiv.org/pdf/2509.04664
3•sonabinu•1d ago

Comments

PaulHoule•1d ago
I think they are basically right, but it's not at the level of "test taking" it's at the level of "linguistic competence".

I worked on a few projects that tried to develop foundation models and ruled out or tried to rule out [1] quite a few approaches based on arguments like "in the tokenization step you lose critical information in 8% of all cases so that puts a ceiling of 92% accuracy"

That wasn't quite right because that's assuming the model has to get the right answer by the right process, if you give it credit for the right answer by the wrong process then maybe it makes a wild assed guess which is right 50% of the time so the ceiling is more like 96%. But we call that last 4% a "hallucination".

You could make the case that we could make models that do a better job of reasoning about probability but I think that the magic of LLMs is that they reason about probability wrong in a way that empirically works and it's my perennially unpopular opinion that the "language instinct" in humans is similarly a derangement of reasoning about probability that collapses the manifold of possible productions into a lower-dimensional space which is easier to learn.

[1] properly in the case of those models I think because nobody else has made progress with then

Gacua: An open-source computer use agent with one-command start

https://github.com/openmule/gacua
1•Anonymitaet•23s ago•1 comments

Watching London from a Double Decker Bus

https://flashbak.com/last-stop-seeing-london-from-a-double-decker-bus-478057/
1•Brajeshwar•24s ago•0 comments

New Mexico first state to launch universal child care system, governor announces

https://www.abqjournal.com/news/article_346f2dd2-4d94-46fd-bc34-615af7b64079.html
1•gamechangr•40s ago•0 comments

We Are Watching a Scientific Superpower Destroy Itself

https://www.nytimes.com/2025/09/08/opinion/universities-science-trump-china.html
1•throw0101a•42s ago•1 comments

Guide to building an application in 2025 – tech stack and tools

https://www.dotnetinterviews.com/blog/building-a-web-app-complete-guide
1•sadeed08•2m ago•0 comments

A New Playbook for Prediction Markets

https://reg2dep.net/news/the-convergence-of-sports-and-finance-a-new-playbook-for-prediction-markets
1•shon_marshall•3m ago•0 comments

Nepal's parliament set on fire after PM resigns over anti-corruption protests

https://www.bbc.com/news/live/c741n80ndlxt
1•colinprince•3m ago•0 comments

The Swipe That Changed the Game

https://reg2dep.net/news/the-swipe-that-changed-the-game-how-online-casinos-are-learning-from-tiktok
1•shon_marshall•4m ago•0 comments

Google to make it easier to access AI Mode as default

https://www.bleepingcomputer.com/news/google/google-to-make-it-easier-to-access-ai-mode-as-default/
1•speckx•4m ago•0 comments

Why 95 Percent of AI Pilots Fail–and How to Avoid It Happening to You

https://every.to/p/why-95-percent-of-ai-pilots-fail-and-how-to-avoid-it-happening-to-you?ph_email...
1•rbanffy•4m ago•0 comments

Investigating IntelliJ Platform UI Freezes

https://blog.jetbrains.com/platform/2025/09/investigating-intellij-platform-ui-freezes/
1•mfiguiere•5m ago•0 comments

Concept‑Driven AI Teaching Assistant Guides Students to Deeper Insights

https://developer.nvidia.com/blog/concept%e2%80%91driven-ai-teaching-assistant-guides-students-to...
1•warrenm•6m ago•0 comments

Show HN: DevOps Alchemy – Little Alchemy with DevOps Elements

https://devops-alchemy.vercel.app
1•opsconjurer•7m ago•0 comments

Pudding: User Discovery for Anonymity Networks

https://martin.kleppmann.com/2024/07/05/pudding-user-discovery-anonymity-networks.html
1•warrenm•8m ago•0 comments

Nvidia: Networking is booming but your networks cost nothing

https://www.sdxcentral.com/news/nvidia-networking-is-booming-but-your-networks-cost-nothing/
1•oavioklein•10m ago•0 comments

TU Graz Develops Hook-and-Loop Fastener for Building Components

https://3dprintingindustry.com/news/tu-graz-develops-hook-and-loop-fastener-for-building-componen...
1•PaulHoule•10m ago•0 comments

Another man gets a pig kidney as transplant trials are poised to start

https://abcnews.go.com/Health/wireStory/new-hampshire-man-gets-pig-kidney-transplant-trials-12535...
2•speckx•11m ago•0 comments

Microsoft is officially sending employees back to the office

https://www.businessinsider.com/microsoft-send-employees-back-to-office-rto-remote-work-2025-9
2•alloyed•11m ago•1 comments

NASA Announces Chapea Crew for Year-Long Mars Mission Simulation

https://www.nasa.gov/missions/analog-field-testing/chapea/nasa-announces-chapea-crew-for-year-lon...
1•rbanffy•12m ago•0 comments

The good shit helping obese teens

https://www.auckland.ac.nz/en/news/2025/08/28/the-good-shit-helping-obese-teens-study.html
1•geox•13m ago•0 comments

Cognition: The Devin Is in the Details

https://www.swyx.io/cognition
2•gk1•14m ago•0 comments

ICE Is Using Fake Cell Towers to Spy on People's Phones

https://www.forbes.com/sites/the-wiretap/2025/09/09/how-ice-is-using-fake-cell-towers-to-spy-on-p...
2•coloneltcb•15m ago•0 comments

Show HN: CoordTool – Offline Coordinate Converter

https://apps.apple.com/us/app/coordtool/id6751749076
1•Akzid•15m ago•0 comments

The First Ziglang.org Outage

https://ziglang.org/news/first-outage/
2•signa11•16m ago•0 comments

Modal Notebooks, a real-time collaborative notebook with cloud GPUs

https://modal.com/blog/notebooks
3•ekzhang•16m ago•1 comments

Poor Charlies Almanack (Charles Munger)

https://www.stripe.press/poor-charlies-almanack/cover
1•vlod•17m ago•0 comments

We rebuilt EKS Auto Mode, but it works on any cloud or on-prem infra

https://www.vcluster.com/blog/introducing-vcluster-auto-nodes-karpenter-based-dynamic-autoscaling...
13•gentele•17m ago•0 comments

Best AI Coding Tools for Rust Projects: IDEs vs. Terminals

https://www.shuttle.dev/blog/2025/09/09/ai-coding-tools-rust
1•emschwartz•17m ago•0 comments

Reasoning Traces from QA Pairs

https://huggingface.co/papers/2509.06160
1•not_a_toaster•18m ago•1 comments

U.S. Added 911,000 Fewer Jobs in the Year Ended in March

https://www.wsj.com/economy/jobs/us-job-growth-revision-a9777d98
2•JumpCrisscross•18m ago•1 comments