frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Amazonbot is finally respecting robots.txt

https://xeiaso.net/notes/2026/amazonbot-respecting-robots-txt/
43•xena•1h ago

Comments

bstsb•19m ago
> Get Outlook for Mac

this bit made me laugh. was the email drafted in Outlook? was it sent to some sort of forwarding mailbox, or did they just BCC every customer in?

jacobn•17m ago
I just complained to them the other day! They were scraping our weather website to no end, very much including the disallowed path prefixes.

Did end up just adding them to our WAF blocklist, which is weirdly ironic - hosting on their infra & using their services to block their AI scraper...

namegulf•7m ago
[delayed]
arjie•2m ago
Huh, I get a lot of traffic from Amazonbot (relative to humans) and try as I might, it would get stuck in a tarpit of no creation because it would sit there and keep blasting every variation of my recent pages because Mediawiki lists many links. I have them appropriately nofollow and warning the bot not to waste its time with robots.txt but it just goes and sticks itself on nonsense internal pages.

The traffic isn't a problem. I've got Cloudflare in front and the machine itself is relatively overpowered, and downtime isn't critical. But I'd just like the thing to be able to spider me properly. Someone did point out to me that maybe I wasn't receiving actual Amazonbot but some other spider: https://news.ycombinator.com/item?id=46352723