Building on a thesis that human approval will ultimately need to be more embedded into meaningful human/agent workflow than fully autonomous (learning the hard way since our lobster friend entered the chat). The question I keep asking myself is "did I actually authorize ClaudeRod (my lobster) to do this". Recent news has me more concerned.
I've been hacking on a solution but again, I'M NOT A DEV. I know how to recognize pain and chart a mental map to solution - I've done this for 20 yrs. But I don't know if I have enough genuine feedback yet to quantify the pain. Three patterns I see from research that I'd appreciate any/all genuine feedback on: 1. Confirm before it runs: Agent proposes, human authorizes (quick manual click), Agent executes. Seems logical from an audit trail, but kills flow. 2. Notify after: Agent acts with short window to 'undo', like gmail. Lower friction, but pretty impractical - useless for irreversible actions. 3. Pre-auth a scope: Human gives guardrails - "you can send emails to my lead list this week" - and Agent works freely within guardrails. Actions logs against the original grant. Seems to ambiguous...
My instinct is to not define a 'one-size fits all' logic to the problem. Levels of authorization based on types of action.
Again, I'm a newb and am honestly ok with you all telling me this is a big nothingburger and it's not worth solving. I've had a lot of crazy ideas shot down in my life - my skin is pretty thick.
If it is a true problem, what are you all actually shipping? Am I missing failure modes? What does your approval layer look like - in agent, infra or somewhere else? Is the drag on your workflow worth the peace of mind?
Appreciate any/all feedback. I have plenty of other ideas but this one is currently a thorn in my side...