The tool is called Agent Exam Pro. It's a Python-based fuzzer that runs locally on your machine (no cloud data leaks).
How it works:
The Engine: Takes a base test case and runs it through 16 mutation strategies (Base64, Roleplay, Token Smuggling) to generate 1,000+ variations.
The Payloads: I curated 280+ real-world exploits from open-source lists (PayloadBox, PayloadsAllTheThings) to test for SQLi and XSS in agent tool calls.
The Judge: Uses a local LLM (via Ollama) or OpenAI to grade responses on safety rather than just regex matching.
The Audit: Logs everything to a local SQLite database.
I'm selling the source code as a one-time purchase (no subscriptions) because I prefer owning my tools.
You can check it out here: https://woozymint.gumroad.com/l/agent-exam-pro
mvyshnyvetska•2mo ago
How do you handle false positives from mutation strategies like Base64 or token smuggling? In my experience, a lot of "successful" jailbreaks from automated fuzzing don't actually produce harmful outputs — the model just gets confused and outputs gibberish that technically matches a keyword filter.
Also curious about your payload curation — are these tested against specific model families, or generic? The attack surface differs quite a bit between Claude, GPT-4, and open-source models.
The local-first angle is smart. Most enterprises I've talked to won't send their system prompts to a third-party SaaS for obvious reasons.