frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

LLMs Corrupt Your Documents When You Delegate

https://arxiv.org/abs/2604.15597
45•rbanffy•5h ago

Comments

jonmoore•1h ago
I really liked the evaluation method here - testing fidelity by round-tripping through chains of invertible steps. It was striking how even frontier models accumulated errors on seemingly computer-friendly tasks.

It would be interesting to know if the stronger results on Python are not just an artefact of the Python-specific evaluation, if they carry over to other common general-purpose languages, and if they are driven by something specific in the training processes.

causal•1h ago
Yeah I've been saying this for a while: AI-washing any text will degrade it, compounding with each pass.

"Semantic ablation" is my favorite term for it: https://www.theregister.com/software/2026/02/16/semantic-abl...

polskibus•56m ago
By „with each pass” do you mean within the same session, or with new session (context window) each time?
sebastiennight•20m ago
In my experience, it happens with each edit of the document, whether or not you clear the context window.

You can somewhat mitigate this, at the same moment you ask for the new edit, by adding new info or specifying the lost meaning you want to add back. But other things will still get washed out.

Nuances will drift, sharp corners will be ablated. You're doing a Xerox copy of your latest Xerox copy, so even if you add your comments with a sharpie, anything that was there right before will be slightly blurrier in the next version.

adampunk•14m ago
Each edit, even with unrelated edits. I had a README referring to something as "the cathedral of s*t" (some HN commentators don't care for the swearing, which is systemically bad news but w/e) and the robot would lift that phrase out in drive-bys, repeatedly.

Occasionally it would report the action, sometimes it would not bother to report it. It never reached into the README on an unrelated doc edit, but if it was touching the README, that line was getting excised.

mohamedkoubaa•46m ago
I've been calling it meanwit reversion
cyanydeez•1h ago
I played around with a local LLM to try and build a wiki like DAG. It made a lot of stupid errors from vague generic things like interpreting based on file names to not following redirects and placing the redirect response in them.

I've also had them convert to markdown something like an excel formatted document. It worked pretty well as long as I was examining the output. But the longer it ran in context, the more likely it was to try in slip things in that seemed related but wasn't part of the break down.

The only way I've found to mitigate some of it is to make every file a small-purpose built doc. This way you can definitely use git to revert changes but also limit the damage every time they touch them to the small context.

Anyone who thinks they're a genius creating docs or updating them isnt actually reading the output.

sebastiennight•15m ago
> I've also had them convert to markdown something like an excel formatted document.

This look like a task where the LLM would be best used in writing a deterministic script or program that then does the conversion.

Trusting a LLM to make the change without tools is like telling the smartest person you know to just recite the converted document out loud from memory. At some point they'll get distracted, wrong, or unwittingly inject their own biases and ideas into it whenever the source data is counter-intuitive to them.

woeirua•46m ago
It's an interesting paper, but I'd like to see a lot more about the types of errors that the LLM makes. Are they happening in the forward pass or the inverse pass? My guess is the inverse pass.
adampunk•29m ago
LLMs will make mistakes on every turn. The mistakes will have little to no apparent connection to "difficulty" or what may or may not be prevalent in the training data. They will be mistakes at all levels of operation, from planning to code writing to reporting. Whether those mistakes matter and whether you catch them is mostly up to you.

I have yet to find a model that does not make mistakes each turn. I suspect that this kind of error is fundamentally incorrigible.

The most interesting thing about LLMs is that despite the above (and its non-determinism) they're still useful.

pyrolistical•22m ago
As a human I make typos all the time
adampunk•13m ago
I do too! I also make higher level design errors and get too enthusiastic about projects before code is written.

We are, in a sense, fallible machines who have designed a planet-wide computational fabric around that fact.

Internet Archive Switzerland

https://internetarchive.ch/
124•hggh•2h ago•15 comments

Google broke reCAPTCHA for de-googled Android users

https://reclaimthenet.org/google-broke-recaptcha-for-de-googled-android-users
1202•anonymousiam•19h ago•434 comments

A recent experience with ChatGPT 5.5 Pro

https://gowers.wordpress.com/2026/05/08/a-recent-experience-with-chatgpt-5-5-pro/
439•_alternator_•11h ago•292 comments

Using Claude Code: The unreasonable effectiveness of HTML

https://twitter.com/trq212/status/2052809885763747935
243•pretext•9h ago•143 comments

How LEDs are made (2014)

https://learn.sparkfun.com/tutorials/how-leds-are-made/all
22•smig0•2d ago•3 comments

OpenAI’s WebRTC problem

https://moq.dev/blog/webrtc-is-the-problem/
374•atgctg•1d ago•95 comments

LLMs Corrupt Your Documents When You Delegate

https://arxiv.org/abs/2604.15597
47•rbanffy•5h ago•12 comments

America's carpet capital: an empire and its toxic legacy

https://apnews.com/projects/pfas-forever-stained/
75•rawgabbit•2d ago•40 comments

Mythical Man Month

https://martinfowler.com/bliki/MythicalManMonth.html
224•ingve•2d ago•144 comments

Making Julia as Fast as C++ (2019)

https://flow.byu.edu/posts/julia-c++
34•d_tr•2d ago•20 comments

David Attenborough's 100th Birthday

https://www.bbc.com/news/articles/cp3pww9g0p5o
729•defrost•1d ago•143 comments

Killswitch: Per-function short-circuit mitigation primitive

https://lwn.net/ml/all/20260507070547.2268452-1-sashal@kernel.org/
33•signa11•4h ago•7 comments

Reviving the IBM Selectric Composer Fonts (2023)

https://www.kutilek.de/selectric/
26•tangus•2d ago•0 comments

What causes lightning? The answer keeps getting more interesting

https://www.quantamagazine.org/what-causes-lightning-the-answer-keeps-getting-more-interesting-20...
101•Tomte•2d ago•22 comments

Wi is Fi: Understanding Wi-Fi 4/5/6/6E/7/8 (802.11 n/AC/ax/be/bn)

https://www.wiisfi.com/
288•homebrewer•2d ago•75 comments

AI is breaking two vulnerability cultures

https://www.jefftk.com/p/ai-is-breaking-two-vulnerability-cultures
363•speckx•20h ago•143 comments

Cartoon Network Flash Games

https://www.webdesignmuseum.org/flash-game-exhibitions/cartoon-network-flash-games
369•willmeyers•21h ago•113 comments

AWS North Virginia data center outage – resolved

https://www.cnbc.com/2026/05/08/aws-outage-data-center-fanduel-coinbase.html
237•christhecaribou•1d ago•167 comments

An Introduction to Meshtastic

https://meshtastic.org/docs/introduction/
467•ColinWright•1d ago•163 comments

The React2Shell Story

https://lachlan.nz/blog/the-react2shell-story/
172•mufeedvh•21h ago•20 comments

Removing fsync from our local storage engine

https://fractalbits.com/blog/remove-fsync/
9•zzsheng•2d ago•3 comments

Teaching Claude Why

https://www.anthropic.com/research/teaching-claude-why
203•pretext•20h ago•93 comments

You gave me a u32. I gave you root. (io_uring ZCRX freelist LPE)

https://ze3tar.github.io/post-zcrx.html
196•MrBruh•18h ago•118 comments

Forking the Web

https://dillo-browser.org/lab/web-fork/
51•wrxd•2h ago•51 comments

Can LLMs model real-world systems in TLA+?

https://www.sigops.org/2026/can-llms-model-real-world-systems-in-tla/
100•mad•21h ago•27 comments

Serving a website on a Raspberry Pi Zero running in RAM

https://btxx.org/posts/memory/
232•xngbuilds•23h ago•92 comments

Light without electricity? Glowing algae could make it possible

https://www.colorado.edu/today/2026/05/06/light-without-electricity-glowing-algae-could-make-it-p...
88•geox•2d ago•26 comments

US Government releases first batch of UAP documents and videos

https://www.war.gov/UFO/
314•david-gpu•1d ago•461 comments

Roadside Attraction

https://theoffingmag.com/essay/roadside-attraction/
30•aways•18h ago•4 comments

Read Programming as Theory Building

https://codeutopia.net/blog/2026/05/09/you-should-read-programming-as-theory-building/
8•birdculture•40m ago•0 comments