Technical measures cannot fundamentally solve the problem; they can only mitigate it, and consume significant resources, and waste time and energy.
Legal, commercial, and ethical means can truly resolve the problem, just as open source software licenses address theft and piracy.
I think the better solution is as follows:
Create a license that prohibits or makes AI companies pay to use your content.
1. Include your license link in your `robots.txt` file, or create another file called `license.txt` in the server's root directory.
2. Include your data use policy in this file. You can prohibit data use or specify the conditions for use, such as payment, price, and duration.
3. If you require AI companies to pay for data use, publish your payment link on a page of your website.
4. If an AI company fails to pay or violates your `license.txt`, you can take legal action against them.
5. If you don't want to, don't have the time, or if litigation is ineffective, you can publicly denounce the company and damage its reputation. Don't underestimate the power of moral condemnation and business reputational damage.
6. It's easy to check if an AI company is using your content. If you confirm that, just take actions against them.
What do you think of this solution, or do you have a better one?
eimrine•3h ago
In 2025 bots (we used to call them spiders) from LLM companies are not just visiting your web-site, they are reading absolutely everything, they are exploiting every possible field with every possible argument, constantly. Their IP-adresses use more traffic than all human visitors if not to ban them. They are totally ignoring robots.txt and mimicring under human activity.
Go and take some legal action against them. Go and damage their reputation. Of course you are able to do these measures.