I've tried to find this graphic against several times over the years but it's either been scrubbed from the internet or I just can't remember enough details to find it. Amusingly, it only just occurred to me that maybe I should ask ChatGPT to help me find it.
We know they did, an earlier version of the LAION dataset was found to contain CSAM after everyone had already trained their image generation models on it.
https://www.theverge.com/2023/12/20/24009418/generative-ai-i...
They uploaded the full "widely-used" training dataset, which happened to include CSAM (child sexual abuse material).
While the title of the article is not great, your wording here implies that they purposefully uploaded some independent CSAM pictures, which is not accurate.
bsowl•37m ago
jkaplowitz•27m ago
Again, to avoid misunderstandings, I said unknowingly - I'm not defending anything about people who knowingly possess or traffic in child porn, other than for the few appropriate purposes like reporting it to the proper authorities when discovered.