Why do AI bots scrape Wikipedia pages instead of downloading the published full database?
nness•1h ago
My guess is that the scraping tools are specialized for web, and creating per-application interfaces isn't cost effective (although you could argue that scraping Wikipedia effectively is definitely worth the effort, but given its all text context with a robust taxonomy/hierarchy, it might be non-issue.)
My other thought is that you don't want a link showing you scraped anything... and faking browser traffic might draw less attention.
fzeroracer•1h ago
The rationale I've seen elsewhere is that it saves money. It means you don't need to go to the effort of downloading, storing and updating your copy of the database. You can offload all of the externalities onto whatever site you're scraping.
SideburnsOfDoom•19m ago
Sheer laziness?
tony-vlcek•1h ago
If the bottom line are donations - as the article states - why push for getting AI companies to link people to Wikipedia instead of pushing for the companies to donate?
flohofwoe•6m ago
Because many small donations from individuals are better than few big ones from corporations for the independence of Wikipedia? Eggs vs baskets etc...
walterbell•2h ago
nness•1h ago
My other thought is that you don't want a link showing you scraped anything... and faking browser traffic might draw less attention.
fzeroracer•1h ago
SideburnsOfDoom•19m ago