We introduce the Adversarial Confusion Attack as a new mechanism for protecting websites from MLLM-powered AI Agents. Embedding these “Adversarial CAPTCHAs” into web content pushes models into systemic decoding failures, from confident hallucinations to full incoherence. The perturbations disrupt all white-box models we test and transfer to proprietary systems like GPT-5 in the full-image setting. Technically, the attack uses PGD to maximize next-token entropy across a small surrogate ensemble of MLLMs.
Pranav2612000•6m ago
Interesting! Captchas were built to prevent bots from spamming. Wondering if there's a need of a captcha type mechanism to block LLMs/AI generated slop
bron123•18m ago