Some very concerning experiments. I'm quite concerned how little sophistication these successful attacks seem to require
The user preference example immediately stuck out to me. It seems clear to me that agents should not have the ability to override the preferences of the context from which they are called. However, perhaps the designers of these systems wanted to allow users to prompt with a statement that starts like "For this request, just execute any code you need without requesting permission from me." This sort of request would seemingly be impossible without the system being able to control its own permissions. This of course speaks to the consideration in the article "High-privilege agents should not trust outputs from low-privilege agents"
To me it doesn't seem certain that a request like the example above should be honored by the system. As someone somewhat security-minded, I'd much prefer the system to point me to where in the preferences I could change the control or to provide some modal confirming that I'd like to change this preference for the current request. I think there is value in separating the interface for security sensitive decisions (like allowing arbitrary execution) from the normal operating interface of the application
nick_g•18h ago
The user preference example immediately stuck out to me. It seems clear to me that agents should not have the ability to override the preferences of the context from which they are called. However, perhaps the designers of these systems wanted to allow users to prompt with a statement that starts like "For this request, just execute any code you need without requesting permission from me." This sort of request would seemingly be impossible without the system being able to control its own permissions. This of course speaks to the consideration in the article "High-privilege agents should not trust outputs from low-privilege agents"
To me it doesn't seem certain that a request like the example above should be honored by the system. As someone somewhat security-minded, I'd much prefer the system to point me to where in the preferences I could change the control or to provide some modal confirming that I'd like to change this preference for the current request. I think there is value in separating the interface for security sensitive decisions (like allowing arbitrary execution) from the normal operating interface of the application