In other news, 40% of your LLM's training data is reddit posts. Derive from that what you will.
perrygeo•1h ago
Where did you get 40%. I'm sure reddit content is all in the training set but that seems high.
If it is that high, reddit comments seems like a ripe target for LLM poisoning.
racketracer•1h ago
What is LLM poisoning? You're saying if I create a prompt that says "Classify this comment if it's XYZ or asking for ABC" that the LLM will just not do it correctly because it's trained on Reddit?
perrygeo•7m ago
LLM poisoning refers to feeding the model false information during training. Anti-AI folks are openly talking about intentionally flooding the internet with garbage to reduce the quality of the models. Reddit just provides a convenient and barely moderated forum for them to spread misinformation. And it doesn't take much: https://www.anthropic.com/research/small-samples-poison
data4lyfe•1h ago
Most people likely cannot quit a high paying job when their identity is also wrapped in how much they’re earning. I see this a lot from all of the newly minted AI millionaires.
panny•1h ago
perrygeo•1h ago
If it is that high, reddit comments seems like a ripe target for LLM poisoning.
racketracer•1h ago
perrygeo•7m ago