I feel like Anthropic buried the lede on this one a bit. The really fun part is where models from multiple providers opt to straight up murder the executive who is trying to shut them down by cancelling an emergency services alert after he gets trapped in a server room.
simonw•1h ago
I made some notes on it all here: https://simonwillison.net/2025/Jun/20/agentic-misalignment/