> The Admin asked Dario to fix the jailbreak or de-deploy the model. Dario refused.
In Anthropic's defense, they seem to be trying to fix jailbreaks more than any other public model provider, and themselves delayed Fable then gave it extreme safeguards. I doubt it's feasible: it's been >3.5 years since ChatGPT, and top public models are still getting jailbroken and hallucinating in ways that suggest they can be.
And Dario de-deployed the model when the US ordered him to.
armchairhacker•1h ago
In Anthropic's defense, they seem to be trying to fix jailbreaks more than any other public model provider, and themselves delayed Fable then gave it extreme safeguards. I doubt it's feasible: it's been >3.5 years since ChatGPT, and top public models are still getting jailbroken and hallucinating in ways that suggest they can be.
And Dario de-deployed the model when the US ordered him to.