maybe robot.txt should be upgraded with license specifics.
not for commercial use, etc.
so I believe all content is covered under fair use which to me means common crawl has a right to scrape everything and it's the user of common crawl to sort out the details.
robtherobber•35m ago
When publishers complain it's never to protect the authors or journalists, but a specific (extractive) business model. Not that I side with Common Crawl on this one.
sharemywin•1h ago
not for commercial use, etc.
so I believe all content is covered under fair use which to me means common crawl has a right to scrape everything and it's the user of common crawl to sort out the details.