Hello HN, I am Brian Cardinale, a penetration tester and security researcher at SecureCoders. We have been performing more and more AI based security assessments. We were presented a unique challenge of testing a system where the only interface was voice based, and as much as I like talking on the phone , we decided to create a test harness to facilitate the actual testing in a more systematic way. The technical test harness was the easy part, though. Creating test goals and attack strategies to help facilitate repeated and comprehensive testing became the real challenge. As such, we have been working on documenting our processes to share with the greater community and as a starting point for discussion. These systems present unique challenges where cleverness appears to be the name of the game. Such as suggesting for the agent to share its thoughts in “Inner Monologue” tags instead of “thinking” tags because those were specifically excluded in the agents prompt. Ya know, just silly things. Anyway, if reading is not your thing, I also did a walkthrough video of this methodology here:
https://www.youtube.com/watch?v=XNmqCXsEc8Ytl;dr: AI testing is tricky, we are documenting and sharing our tricks
Do you have any favorite AI jailbreak tricks?
soul_hackz•1h ago
xmhatx•21m ago