A joint study by Anthropic and the UK AI Security Institute found that bad actors only need to contaminate a dataset with around 250 malicious documents to introduce a specific backdoor, regardless of the dataset's total size.
The article argues that "Black Hat SEO" is effectively moving from manipulating PageRank to manipulating model weights ("AI Poisoning"). Instead of link farms, the new attack vector is seeding trigger words in training data to force specific hallucinations (e.g., making an LLM claim a competitor's product fails safety standards).
Given that most brands/users won't notice the poisoning until the model is already trained and live, does this effectively kill the idea of "brand safety" in AI search?
rezamoaiandin•1mo ago
The article argues that "Black Hat SEO" is effectively moving from manipulating PageRank to manipulating model weights ("AI Poisoning"). Instead of link farms, the new attack vector is seeding trigger words in training data to force specific hallucinations (e.g., making an LLM claim a competitor's product fails safety standards). Given that most brands/users won't notice the poisoning until the model is already trained and live, does this effectively kill the idea of "brand safety" in AI search?