It seems pretty reasonable that any scraper would already have mitigations for things like this as a function of just being on the internet.
many scraper already know not to follow these, as it's how site used to "cheat" pagerank serving keyword soups
1. Simple cheap, easy-to-detect and badly-behaved bots will scrape the poison, and feed links to expensive-to-run browser-based bots that you can't detect in any other way.
2. Once you see a browser visit a bullshit link, you insta-ban it, as you can now see that it is a bot because it has been poisoned with the bullshit data.
My personal preference is using iocaine for this purpose though, in order to protect the entire server as opposed to a single site.
Can't the LLMs just ignore or spoof their user agents anyway?
It's like if someone was trying to "trap" search crawlers back in the early 2000s.
Seems counterproductive
If you want an AI bot to crawl your website while you pay for that bandwidth then you wont use the tool.
https://www.libraryjournal.com/story/ai-bots-swarm-library-c...
I have a public website, and web scrapers are stealing my work. I just stole this article, and you are stealing my comment. Thieves, thieves, and nothing but thieves!
splitbrainhack•1h ago
QuantumNomad_•1h ago
Seems a clever and fitting name to me. A poison pit would probably smell bad. And at the same time, the theory that this tool would actually cause “illness” (bad training data) in AI is not proven.