Did end up just adding them to our WAF blocklist, which is weirdly ironic - hosting on their infra & using their services to block their AI scraper...
Cloudflare had a nice technic to address the bot problem (if you use their name servers). It'll respect and use the robots.txt while sending the remaining bots to a deep black hole.
That said, one of the biggest websites in the world not respecting it is definitely a noteworthy story. Hopefully another one of the biggest websites in the world (formerly known as Twitter) eventually respects it as well instead of not even disclosing itself via a user agent and pretending to be Safari running on iOS.
The traffic isn't a problem. I've got Cloudflare in front and the machine itself is relatively overpowered, and downtime isn't critical. But I'd just like the thing to be able to spider me properly. Someone did point out to me that maybe I wasn't receiving actual Amazonbot but some other spider: https://news.ycombinator.com/item?id=46352723
> Amazonbot is used to improve our products and services. This helps us provide more accurate information to customers and may be used to train Amazon AI models.
bstsb•58m ago
this bit made me laugh. was the email drafted in Outlook? was it sent to some sort of forwarding mailbox, or did they just BCC every customer in?