> Now here the date is more flexible, let's say 2022. But if you're collecting data before 2022 you're fairly confident that it has minimal, if any, contamination from generative AI. Everything before the date is 'safe, fine, clean,' everything after that is 'dirty.'"
Though what it seems to actually mean is that it's a problem for (future) generative AI (the "genAI collapse"). To which I say;
The most damning part for me is mentioning the Apple paper and the refute of the Apple paper, to my knowledge that paper had nothing to do with training on generated data. It was talking about reasoning models, but because they use the word “model collapse”, apparently, the author of this article decided to include it in, which just shows how they don’t know what they’re talking about (unless I’m completely misunderstanding the Apple paper).
Humanity now lives in a world where any text has most likely been influenced by AI, even if it’s by multiple degrees of separation.
Den_VR•7mo ago
etherlord•7mo ago
ChrisArchitect•7mo ago
willis936•7mo ago
cheschire•7mo ago
Den_VR•7mo ago
Eddy_Viscosity2•7mo ago