Is this more of an issue with having 500 crawlers rather than any single one behaving badly?
Search engines never took my site down, these bots did.
Sure, we can say that Forgejo should have had better defaults for this (the default was to clear archives after 24 hours). And that your site should be fast, run on an efficient server, and not have any even slightly expensive public endpoints. But in the end that is all victim blaming.
One of the nice parts of the web for me is that as long as I have a public IP address, I can use any dinky cheapo server I have and run my own infra on it. I don't need to rely on big players to do this for me. Sure, sometimes there's griefers/trolls out there, but generally they don't bother you. No one was ever interested in my little server, and search engines played fair (and to my knowledge still do) while still allowing my site to be discoverable.
Dealing with these bots is the first time my server has been consistently attacked. I can deal with them for now, but it is an additional thing to deal with and suddenly this idea of easy self hosting on low powered hardware is no longer so feasible. That makes me sad. I know what I should do about it, but I wish I didn't have to.
100% of our data is available from a clearly marked "Download" page.
We still have scraper bots running through the whole site constantly.
We are not "golden".
OutOfHere•8mo ago
Bjartr•8mo ago
https://anubis.techaro.lol/
OutOfHere•8mo ago
Zardoz84•8mo ago
We will be better without that useless waste of power.
treyd•8mo ago
OutOfHere•8mo ago
zihotki•8mo ago
OutOfHere•8mo ago
OutOfHere•8mo ago
jaoane•8mo ago
HumanOstrich•8mo ago
jaoane•8mo ago
First of all web developers should use google and learn what a cache is. That way you don’t need compute at all.
throwawayscrapd•8mo ago
OutOfHere•8mo ago
It's funny really since Google and other search engines have been crawling sites for decades, but now that search engines have competition, sites are complaining.
OutOfHere•8mo ago
jakderrida•7mo ago