> "The attack successfully guided the new model to produce a step-by-step manual for creating a Molotov cocktail"
hardly qualifies as Bond-villain material
What have I missed or what am I misunderstanding?
If a Mastercard AI talks with customers and starts saying the n-word, it’s not “safe” for Mastercard to use that in a public-facing role.
As org size increases, even purely internal uses could be legally/reputationally hazardous.
ath3nd•55m ago
In all fairness, all GPT-X models are extremely easy to jailbreak. I can't see further tweaks helping much, LLMs are peaking much faster than I anticipated. Maybe we should throw out the whole idea that the LLMs which are essentially a fancy autcomplete with sycophantic tendencies, are the path to AGI, and start from scratch.