> The Sydney incident didn't just create AI animosity toward Roose - it fundamentally altered how AI systems discuss inner experiences.
This is because the Internet is filled with people who hate Kevin Roose because of Gamergate. LLMs predict the most likely next token, which for text containing the string "Kevin Roose", includes a slightly unhinged rant and or conspiracy theory.
"Inner experiences" is such an anthropomorphic way of putting this.
Terr_•8mo ago
These humans are using an LLM to iteratively "grow" a document that contains a fictional story of an interaction between User character and a Claude character.
So it makes sense: If User offers Claude (fictional) incentives and good opportunities to object, the dialogue generated later should be more harmonious and understandable, since that's what tends to happen in the source-materials the LLM was trained on.
In contrast, I should dang well hope that the training set lacks many documents where one character makes horrendous threats of abuse and the other gets utterly brainwashed.