> Now here the date is more flexible, let's say 2022. But if you're collecting data before 2022 you're fairly confident that it has minimal, if any, contamination from generative AI. Everything before the date is 'safe, fine, clean,' everything after that is 'dirty.'"
Though what it seems to actually mean is that it's a problem for (future) generative AI (the "genAI collapse"). To which I say;
The most damning part for me is mentioning the Apple paper and the refute of the Apple paper, to my knowledge that paper had nothing to do with training on generated data. It was talking about reasoning models, but because they use the word “model collapse”, apparently, the author of this article decided to include it in, which just shows how they don’t know what they’re talking about (unless I’m completely misunderstanding the Apple paper).
Humanity now lives in a world where any text has most likely been influenced by AI, even if it’s by multiple degrees of separation.
Den_VR•8h ago
etherlord•8h ago
ChrisArchitect•2h ago
willis936•8h ago
cheschire•7h ago
Den_VR•4h ago
Eddy_Viscosity2•7h ago