frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Amazonbot is finally respecting robots.txt

https://xeiaso.net/notes/2026/amazonbot-respecting-robots-txt/
66•xena•2h ago

Comments

bstsb•58m ago
> Get Outlook for Mac

this bit made me laugh. was the email drafted in Outlook? was it sent to some sort of forwarding mailbox, or did they just BCC every customer in?

jacobn•56m ago
I just complained to them the other day! They were scraping our weather website to no end, very much including the disallowed path prefixes.

Did end up just adding them to our WAF blocklist, which is weirdly ironic - hosting on their infra & using their services to block their AI scraper...

BLKNSLVR•24m ago
I hope you leave it on the WAF. If they're only just deciding to respect robots.txt, which has been internet infrastructure forever, then it's probably still incredibly amateur software with 'Amazon-priorities' rather than 'responsible internet traffic' priorities.
namegulf•46m ago
Robots.txt is lame BTW, there is no way to enforce it. It is up to the bot to decide to crawl or not and most cases they don't care.

Cloudflare had a nice technic to address the bot problem (if you use their name servers). It'll respect and use the robots.txt while sending the remaining bots to a deep black hole.

input_sh•18m ago
Yes, we know, its purpose is to guide the bots, not forcibly block them.

That said, one of the biggest websites in the world not respecting it is definitely a noteworthy story. Hopefully another one of the biggest websites in the world (formerly known as Twitter) eventually respects it as well instead of not even disclosing itself via a user agent and pretending to be Safari running on iOS.

namegulf•5m ago
[delayed]
arjie•41m ago
Huh, I get a lot of traffic from Amazonbot (relative to humans) and try as I might, it would get stuck in a tarpit of no creation because it would sit there and keep blasting every variation of my recent pages because Mediawiki lists many links. I have them appropriately nofollow and warning the bot not to waste its time with robots.txt but it just goes and sticks itself on nonsense internal pages.

The traffic isn't a problem. I've got Cloudflare in front and the machine itself is relatively overpowered, and downtime isn't critical. But I'd just like the thing to be able to spider me properly. Someone did point out to me that maybe I wasn't receiving actual Amazonbot but some other spider: https://news.ycombinator.com/item?id=46352723

TurdF3rguson•31m ago
Why does Amazonbot even exist, can someone explain? I don't understand why an ecommerce play would be crawling other websites.
embedding-shape•26m ago
Amazonbot is specifically the user agent they use for crawling for "provide more accurate information to customers" (whatever that means, could be anything it sounds like) and also when they scrape for data used in AI training, according to https://developer.amazon.com/amazonbot
reaperducer•26m ago
AI. Gotta slurp the world.
input_sh•25m ago
To train AI. Not even a hyperbole, that is the only concrete example they list in their explanation: https://developer.amazon.com/amazonbot

> Amazonbot is used to improve our products and services. This helps us provide more accurate information to customers and may be used to train Amazon AI models.

tintor•25m ago
To ensure Amazon marketplace sellers aren't offering lower prices on other ecommerce websites. Also AI.

Removing the modem and GPS from my 2024 RAV4 hybrid

https://arkadiyt.com/2026/05/13/removing-the-modem-and-gps-from-my-rav4/
455•arkadiyt•5h ago•260 comments

Amazonbot is finally respecting robots.txt

https://xeiaso.net/notes/2026/amazonbot-respecting-robots-txt/
68•xena•2h ago•11 comments

First public macOS kernel memory corruption exploit on Apple M5

https://blog.calif.io/p/first-public-kernel-memory-corruption
151•quadrige•3h ago•23 comments

RTX 5090 and M4 MacBook Air: Can It Game?

https://scottjg.com/posts/2026-05-05-egpu-mac-gaming/
428•allenleee•6h ago•113 comments

New Nginx Exploit

https://github.com/DepthFirstDisclosures/Nginx-Rift
235•hetsaraiya•5h ago•55 comments

Tesla Wall Connector bootloader bypasses the firmware downgrade ratchet

https://www.synacktiv.com/en/publications/exploiting-the-tesla-wall-connector-from-its-charge-por...
27•p_stuart82•1h ago•0 comments

Claude for Legal

https://github.com/anthropics/claude-for-legal
34•Einenlum•1h ago•23 comments

Work with Codex from Anywhere

https://openai.com/index/work-with-codex-from-anywhere/
61•mikeevans•2h ago•14 comments

Infracost (YC W21) Is Hiring Sr Dev Advocate to make agents cloud cost-aware

https://www.ycombinator.com/companies/infracost/jobs/NzwUQ7c-senior-developer-advocate
1•akh•1h ago

RISC-V Router

https://router.start9.com/
34•janandonly•2h ago•18 comments

OVMS: Open source electric vehicle remote monitoring, diagnosis and control

https://www.openvehicles.com/home
7•BHSPitMonkey•32m ago•1 comments

Porting 3D Movie Maker to Linux

https://benstoneonline.com/posts/porting-3d-movie-maker-to-linux/
40•speckx•3d ago•9 comments

HDD Firmware Hacking

https://icode4.coffee/?p=1465
99•jsploit•6h ago•9 comments

The Biochemical Beauty of Retatrutide: How GLP-1s Work

https://acesounderglass.com/2025/10/13/the-biochemical-beauty-of-retatrutide-how-glp-1s-actually-...
21•surprisetalk•3d ago•12 comments

The Power of a Free Popsicle (2018)

https://www.gsb.stanford.edu/insights/power-free-popsicle
51•NaOH•3h ago•19 comments

New arXiv policy: 1-year ban for hallucinated references

https://twitter.com/tdietterich/status/2055000956144935055
105•gjuggler•1h ago•9 comments

Computer Hobby Movement in Canada

https://museum.eecs.yorku.ca/exhibits/show/hobby_canada/hobby_canada
169•rbanffy•9h ago•54 comments

Int a = 5; a = a++ + ++a; a =? (2011)

https://gynvael.coldwind.pl/?id=372
70•e-topy•2d ago•125 comments

A message from President Kornbluth about funding and the talent pipeline

https://president.mit.edu/writing-speeches/video-transcript-message-president-kornbluth-about-fun...
551•dmayo•7h ago•610 comments

WinUI 3 Performance: A Leap Forward

https://github.com/microsoft/microsoft-ui-xaml/discussions/11096
72•whatever3•3h ago•54 comments

Understanding the Linux Kernel: The Linux Kernel Startup

https://internals-for-interns.com/posts/linux-kernel-startup/
67•valyala•3h ago•10 comments

Wrap Go binaries in Python wheels

https://github.com/simonw/go-to-wheel
5•ankitg12•2d ago•0 comments

What's in a GGUF, besides the weights – and what's still missing?

https://nobodywho.ooo/posts/whats-in-a-gguf/
62•bashbjorn•5h ago•26 comments

You Don't Align an AI, You Align with It

https://danieltan.weblog.lol/2026/05/you-dont-align-an-ai-you-align-with-it
73•danieltanfh95•4h ago•37 comments

AI is making me dumb

https://jpain.io/god-damn-ai-is-making-me-dumb/
342•Eighth•4h ago•216 comments

Rewrite Bun in Rust has been merged

https://github.com/oven-sh/bun/pull/30412
421•Chaoses•14h ago•502 comments

DIY open-source ultrasound hardware on the rp2040/rp2350

http://un0rick.cc/pic0rick
32•kelu124•4h ago•2 comments

Show HN: I built a Web-Scraper API that is 6-7x more efficient than current ones

https://scrapewithruno.com/
13•polaritymaking•1h ago•7 comments

Fossils show millipede and centipede ancestors evolved legs underwater

https://phys.org/news/2026-05-ancient-sea-fossils-millipede-centipede.html
65•gmays•3d ago•2 comments

London's Smallest Public Sculptures

https://lookup.london/londons-smallest-public-sculptures/
28•susam•3d ago•3 comments