self-hosting was originally a "right" we had upon gaining access to the internet in the 90s, it was the main point of the hyper text transfer protocol.
I would assume any halfway competent LLM driven scraper would see a mass of 404s and stop. If they're just collecting data to train LLMs, these seem like exceptionally poorly written and abusive scrapers written the normal way, but by more bad actors.
Are we seeing these scrapers using LLMs to bypass auth or run more sophisticated flows? I have not worked on bot detection the last few years, but it was very common for residential proxy based scrapers to hammer sites for years, so I'm wondering what's different.
Jaxkr•28m ago
Cloudflare will even do it for free.
rubiquity•23m ago
isodev•22m ago
That Cloudflare is trying to monetise “protection from AI” is just another grift in the sense that they can’t help themselves as a corp.
denkmoon•15m ago
fouc•13m ago
the_fall•12m ago
overgard•9m ago