LLMs Don't Hallucinate – They Drift

https://figshare.com/articles/conference_contribution/Measuring_Fidelity_Decay_A_Framework_for_Semantic_Drift_and_Collapse/30422107?file=58969378

17•knowledgeinfra•1w ago

Comments

knowledgeinfra•1w ago

This paper argues that the dominant metaphor for LLM failure, hallucinations, misdiagnoses the real problem. Language models do not primarily fail by inventing false facts, but by undergoing fidelity decay, the gradual erosion of meaning across recursive transformations. Even when outputs remain accurate and coherent, nuance, metaphor, intent, and contextual ground steadily degrade. The paper proposes a unified framework for measuring this collapse through four interrelated dynamics, lexical decay, semantic drift, ground erosion, and semantic noise, and sketches how each can be operationalized into concrete benchmarks. The central claim is that accuracy alone is an insufficient evaluation target. Without explicit fidelity metrics, AI systems risk becoming fluent yet hollow, technically correct while culturally and semantically impoverished.

petesergeant•1w ago

Please don’t post AI summaries here

chrisjj•1w ago

> Language models do not primarily fail by inventing false facts, but by undergoing fidelity decay

This premise is unsound. We don't expect LLMs to deliver with fidelity, just as we don't expect parrots to speak with their owners' accents. So infidelity is by no means a failure.

zahrevsky•1w ago

> The contribution of this work lies in its move from critique to measurement. It proposes concrete methods: recursive summarization chains, metaphor stress-tests, resonance surveys, and noise-infused retrieval experiments. These allow researchers to track how meaning erodes over time. By integrating these methods, it outlines a pathway toward fidelity-centered benchmarks that complement existing accuracy metrics.

To me, starting to solve the problem by meticulously measuring it, is a sign of a good solution.

Retr0id•1w ago

What the heck is a resonance survey

chrisjj•1w ago

An LLM fabrication.

chrisjj•1w ago

True title: Measuring Fidelity Decay: A Framework for Semantic Drift and Collapse

botacode•1w ago

Getting a 403 when I try to read. Anyone have a backup link?

Retr0id•1w ago

This is slop

sylware•1w ago

ofc not, they "bungee jump"

m0llusk•1w ago

Hallucinations that have certain characteristics and boundaries are still hallucinations. This is happening because learning models are doing pattern matching, so to put it briefly anything that fits may work and end up in the output.

Being able to admit the flaws and limitations of a technology is often critical to advancing adoption. Unfortunately, producers of currently popular learning model based technologies are more interested in speculation and growth and speculative growth than genuinely robust operation. This paper is a symptom of a larger problem that is contributing to the bubble pop, downturn, or "AI winter" that we are collectively heading toward.

chrisjj•1w ago

That diagnosis is supported by the author blurb:

The Lab’s goal is to ensure AI systems do not only produce fluent answers but also preserve the purpose, nuance, and integrity of language itself.

polotics•1w ago

This is so short and empty sorry, the author would be well placed to try to ground their work in a modicum of empiricism, the puffed-up style here makes things a bit hard to read. I do not know if this is slop it's getting harder to guess, and some actual humans have been writing like this long before LLMs. Still, what is the actual finding being presented here?

jnamaya•1w ago

This paper perfectly articulates the problem I spent the last year solving. The shift from "hallucination" to "fidelity decay" is the correct mental model for agent stability.

I built an open source framework called SAFi that implements the "Fidelity Meter" concept mentioned in section 4. It treats the LLM as a stochastic component in a control loop. It calculates a rolling "Alignment State" (using an Exponential Moving Average) and measures "Drift" as the vector distance from that state.

The paper discusses "Ground Erosion" where the model loses its hierarchy of values. In my system, the "Spirit" module detects this erosion and injects negative feedback to steer the agent back to the baseline. I recently red-teamed this against 845 adversarial attacks and it maintained fidelity 99.6% of the time.

It is cool to see the theoretical framework catching up to what is necessary in engineering practice.

Repo link: https://github.com/jnamaya/SAFi

Prejudice Against Leprosy

Slint: Cross Platform UI Library

AI and Education: Generative AI and the Future of Critical Thinking

Maple Mono: Smooth your coding flow

Moltbook isn't real but it can still hurt you

Take Back the Em Dash–and Your Voice

Show HN: 289x speedup over MLP using Spectral Graphs

Teaching Mathematics

3D Printed Microfluidic Multiplexing [video]

Abstractions Are in the Eye of the Beholder

Show HN: Routed Attention – 75-99% savings by routing between O(N) and O(N²)

We didn't ask for this internet – Ezra Klein show [video]

The Real AI Talent War Is for Plumbers and Electricians

Show HN: MimiClaw, OpenClaw(Clawdbot)on $5 Chips

I Maintain My Blog in the Age of Agents

The Fall of the Nerds

I'm 15 and built a free tool for reading Greek/Latin texts. Would love feedback

How close is AI to taking my job?

You are the reason I am not reviewing this PR

Show HN: FamilyMemories.video – Turn static old photos into 5s AI videos

How Meta Made Linux a Planet-Scale Load Balancer

A Turing Test for AI Coding

How to Identify and Eliminate Unused AWS Resources

A2CDVI – HDMI output from from the Apple IIc's digital video output connector

CLI for Common Playwright Actions

Would you use an e-commerce platform that shares transaction fees with users?

Show HN: SafeClaw – a way to manage multiple Claude Code instances in containers

The Future of the Global Open-Source AI Ecosystem: From DeepSeek to AI+

The Evolution of the Interface

Azure: Virtual network routing appliance overview

Prejudice Against Leprosy

Slint: Cross Platform UI Library

AI and Education: Generative AI and the Future of Critical Thinking

Maple Mono: Smooth your coding flow

Moltbook isn't real but it can still hurt you

Take Back the Em Dash–and Your Voice

Show HN: 289x speedup over MLP using Spectral Graphs

Teaching Mathematics

3D Printed Microfluidic Multiplexing [video]

Abstractions Are in the Eye of the Beholder

Show HN: Routed Attention – 75-99% savings by routing between O(N) and O(N²)

We didn't ask for this internet – Ezra Klein show [video]

The Real AI Talent War Is for Plumbers and Electricians

Show HN: MimiClaw, OpenClaw(Clawdbot)on $5 Chips

I Maintain My Blog in the Age of Agents

The Fall of the Nerds

I'm 15 and built a free tool for reading Greek/Latin texts. Would love feedback

How close is AI to taking my job?

You are the reason I am not reviewing this PR

Show HN: FamilyMemories.video – Turn static old photos into 5s AI videos

How Meta Made Linux a Planet-Scale Load Balancer

A Turing Test for AI Coding

How to Identify and Eliminate Unused AWS Resources

A2CDVI – HDMI output from from the Apple IIc's digital video output connector

CLI for Common Playwright Actions

Would you use an e-commerce platform that shares transaction fees with users?

Show HN: SafeClaw – a way to manage multiple Claude Code instances in containers

The Future of the Global Open-Source AI Ecosystem: From DeepSeek to AI+

The Evolution of the Interface

Azure: Virtual network routing appliance overview

LLMs Don't Hallucinate – They Drift

Comments