"Thanks to LLM scrapers, hosting costs went up 5000% last month"
ssivark•49m ago
Uggghhhh! AI crawling is fast becoming a headache for self-hosted content. Is using a CDN the "lowest effort" solution? Or is there something better/simpler?
embedding-shape•7m ago
Nah, just add a rate limiter (which any public website should have anyways). Alternatively, add some honeypot URLs to robots.txt, then setup fail2ban to ban any IP accessing those URLs and you'll get rid of 99% of the crawling in half a day.
kace91•44m ago
>I immediately burned down my account with that hosting provider1, because they did not allow setting a spending limit.
Is this true? He mentions the provider being AWS, surely some sort of threshold can be set?
no_wizard•42m ago
As far as I am aware, there is not. It’s been a long standing complaint about the platform.
nijave•15m ago
If it's AWS, yes it's true. All the billing is async and some as slow as daily (although it can be very granular/accurate).
In addition, it's a pay-per-use platform
embedding-shape•8m ago
I don't know exactly what the website was, but if it's just HTML, CSS, some JS and some images, why would you ever host that on a "pay per visit/bandwidth" platform like
AWS? Not only is AWS traffic extra expensive compared to pretty much any alternative, paying for bandwidth in that manner never made much sense to me. Even shared hosting like we did early 00s would have been a better solution for hosting a typical website than using AWS.
cratermoon•1h ago
ssivark•49m ago
embedding-shape•7m ago