At a previous company we built an agentic workflow that generated a finance report end-to-end: it wrote SQL queries, rendered charts, and assembled a markdown report.
The first run felt magical once all the tools were wired up. The second run often broke it (queries changed, charts drifted, structure degraded).
What surprised me most wasn’t that it broke, but how hard it was to recover. Couldn’t easily restore a previous “good” state, and that made demos to design partners stressful and fragile.
At the same time, AI-assisted programming feels much more robust. Tools like Claude CLI or Cursor can break things, but it’s usually easy to recover (diffs, history, rollback).
I’m trying to understand the asymmetry:
Are your agents also writing to persistent state?
Did you experience something like this in your own apps?
Or are we missing a good abstraction here?