The launch of ChatGPT polluted the world forever

https://www.theregister.com/2025/06/15/ai_model_collapse_pollution/

29•rntn•7mo ago

Comments

Den_VR•7mo ago

Someday maybe we’ll have a term similar to “low-background steel” for information and web content.

etherlord•7mo ago

https://blog.jgc.org/2025/06/low-background-steel-content-wi...

ChrisArchitect•7mo ago

Large discussion earlier this week: https://news.ycombinator.com/item?id=44239481

willis936•7mo ago

The root of it is deterioration in trust. Even before LLMs hit the scene there was suspicion of narrative manipulation by social media sites. ChatGPT only changed how popular this take is, but not its measure.

cheschire•7mo ago

Why did you paraphrase the article’s subtitle?

Den_VR•7mo ago

I’m not sure if admitting I didn’t even open the article helps or harms my case.

Eddy_Viscosity2•7mo ago

This is a great analogy.

happa•7mo ago

LLMs don't really need more training data than they already have. They just need to start using it more efficiently.

_1tem•7mo ago

Exactly. Smart humans work with far less training data and do better.

ghusto•7mo ago

The article keeps making it sound as if it's a problem for humans. e.g.:

> Now here the date is more flexible, let's say 2022. But if you're collecting data before 2022 you're fairly confident that it has minimal, if any, contamination from generative AI. Everything before the date is 'safe, fine, clean,' everything after that is 'dirty.'"

Though what it seems to actually mean is that it's a problem for (future) generative AI (the "genAI collapse"). To which I say;

joshstrange•7mo ago

This seems like a very badly written article that rambles on in random directions. It proposes incredibly dumb ideas to anyone with half a brain like water marking AI output.

The most damning part for me is mentioning the Apple paper and the refute of the Apple paper, to my knowledge that paper had nothing to do with training on generated data. It was talking about reasoning models, but because they use the word “model collapse”, apparently, the author of this article decided to include it in, which just shows how they don’t know what they’re talking about (unless I’m completely misunderstanding the Apple paper).

m4r1k•7mo ago

This! And I’d add, it’s the Register–it has always had a very low bar.

famahar•7mo ago

lowbackgroundsteel.ai sounds really promising. I don't really care for it as a clean AI training source, but I'm interested in a curated internet where I know it's not diluted with generative content. I'm not sure what that would look like when it comes to social media. This AI era has made me return to reading physical books as a hobby and engaging with offline/non-anonymous online communities more. Confidence in authenticity is one of the most important things for me these days.

iJohnDoe•7mo ago

I have no expertise in LLMs. I do think the article poses an interesting question. How do you get the models recent information without ingesting information that has been generated by AI. I’m sure it’s possible, but not without a certain level of uncertainty.

Humanity now lives in a world where any text has most likely been influenced by AI, even if it’s by multiple degrees of separation.

Show HN: LocalGPT – A local-first AI assistant in Rust with persistent memory

Haskell for all: Beyond agentic coding

SectorC: A C Compiler in 512 bytes (2023)

Speed up responses with fast mode

Software factories and the agentic moment

Brookhaven Lab's RHIC concludes 25-year run with final collisions

Hoot: Scheme on WebAssembly

Homeland Security Spying on Reddit Users

Stories from 25 Years of Software Development

LLMs as the new high level language

Total Surface Area Required to Fuel the World with Solar (2009)

First Proof

Vocal Guide – belt sing without killing yourself

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

Al Lowe on model trains, funny deaths and working with Disney

FDA intends to take action against non-FDA-approved GLP-1 drugs

Vouch

Show HN: A luma dependent chroma compression algorithm (image compression)

Show HN: Axiomeer – An open marketplace for AI agents

Start all of your commands with a comma (2009)

The AI boom is causing shortages everywhere else

Microsoft account bugs locked me out of Notepad – Are thin clients ruining PCs?

I write games in C (yes, C) (2016)

Learning from context is harder than we thought

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

Selection rather than prediction

Where did all the starships go?

The F Word

Unseen Footage of Atari Battlezone Arcade Cabinet Production

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox