Then he added a code freeze.
The agent ignored it and deleted the entire production database.
Why?
1. No environment separation. Dev, staging, and prod looked identical to the agent.
2. No human in the loop. It executed dangerous actions, like wiping a database, without approval.
3. No evaluator agent. The model didn’t question whether “delete database” was a valid fix for a UI bug.
This wasn’t a model bug. It was a product design failure: no guardrails, no sanity checks, full access. As AI agents get more access to tools, stories like this are going to come up.
What are your thoughts on this?
tomasen9987•5h ago
Arindam1729•5h ago