The tool is called Agent Exam Pro. It's a Python-based fuzzer that runs locally on your machine (no cloud data leaks).
How it works:
The Engine: Takes a base test case and runs it through 16 mutation strategies (Base64, Roleplay, Token Smuggling) to generate 1,000+ variations.
The Payloads: I curated 280+ real-world exploits from open-source lists (PayloadBox, PayloadsAllTheThings) to test for SQLi and XSS in agent tool calls.
The Judge: Uses a local LLM (via Ollama) or OpenAI to grade responses on safety rather than just regex matching.
The Audit: Logs everything to a local SQLite database.
I'm selling the source code as a one-time purchase (no subscriptions) because I prefer owning my tools.
You can check it out here: https://woozymint.gumroad.com/l/agent-exam-pro