We’ve released a preliminary work introducing a new vector of attack against multimodal LLMs. Unlike jailbreaks or targeted misclassification, this method explicitly optimizes for confusion - maximizing next-token entropy to induce systematic malfunction.
The attack successfully fooled GPT-5, producing structured hallucinations from an adversarial image.
The ultimate goal is to prevent AI Agents from reliably operating on websites by embedding adversarial “confusion images” into their visual environments.
bron123•2h ago
The attack successfully fooled GPT-5, producing structured hallucinations from an adversarial image.
The ultimate goal is to prevent AI Agents from reliably operating on websites by embedding adversarial “confusion images” into their visual environments.