frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

GPT Bot Ignoring Robots.txt on my cloudflare worker

3•white_viel•2h ago
TLDR: GPT Bot is systematically accessing my private ubuntu mirror, ignoring the robots.txt

Today in the morning I woke up to the following message from Cloudflare about my quota usage on Cloudflare workers

>> Your account has reached 75% of its daily requests limit for Cloudflare Workers and/or Pages Functions

This is unusual as only have one worker on my Cloudflare account that proxies my apt repos for my personal PC to specific upstream services. As much as the domain is public, it is not posted anywhere and only used for my home PCs.

So i get the Cloudflare worker logs and see about 160k requests in the last 24 hours, up from barely 24(yes 24 in total) to various packaged via my proxy.

Extracted part of the logs is as below

>> { >> "headers": { >> "accept": "/", >> "accept-encoding": "gzip, br", >> "cf-connecting-ip": "74.7.227.53", >> "cf-ipcountry": "US", >> "cf-ray": "9d388b074b38d3be", >> "cf-visitor": "{"scheme":"https"}", >> "connection": "Keep-Alive", >> "from": "gptbot(at)openai.com", >> "host": "XXXXXXXXXXXXXXXXX.brotich.workers.dev", >> "referer": "https://XXXXXXXXXXXXXXXXX.brotich.workers.dev/ubuntu/pool/universe/z/zephyr/", >> "user-agent": "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.3; +https://openai.com/gptbot)", >> "x-forwarded-proto": "https", >> "x-openai-host-hash": "103003167", >> "x-real-ip": "74.7.227.53" >> } >> }

as you can see, the request is from GPTBot that collect training data.

Now the annoying bit: - according to openapi, they respect robots.txt. I have this set up on my domain as follows

>>> # BEGIN Cloudflare Managed content >>> >>> User-agent: * >>> Content-Signal: search=yes,ai-train=no >>> Allow: / >>> >>> User-agent: Amazonbot >>> Disallow: / >>> >>> User-agent: Applebot-Extended >>> Disallow: / >>> >>> User-agent: Bytespider >>> Disallow: / >>> >>> User-agent: CCBot >>> Disallow: / >>> >>> User-agent: ClaudeBot >>> Disallow: / >>> >>> User-agent: Google-Extended >>> Disallow: / >>> >>> User-agent: GPTBot >>> Disallow: / >>> >>> User-agent: meta-externalagent >>> Disallow: / >>> >>> # END Cloudflare Managed Content

This is just a hobby project, and I have put safeguards on Cloudflare to prevent scarping by bot. there is nothing of value in there. it's just a proxy for my own use.

why say you respect robots.txt if you dont?

Comments

white_viel•1h ago
will be serving a zip bomb to the bot to see if they stay away from my proxy
white_viel•25m ago
serving a zip bomb and after 10 minutes, the traffic from the gpt bot disappeared..
white_viel•1m ago
update: the bot is back now with a vengeance, sending request at about 1 request per second. ignoring robots.txt and the status code 403

Temple of boom: Why Taiwan's religious sites are becoming unlikely rave venues

https://www.theguardian.com/music/2026/feb/24/taiwan-religious-sites-rave-venues-temple-meltdown
1•ryan_j_naughton•1m ago•0 comments

Greenland Sharks Defy Aging

https://www.sciencenews.org/article/greenland-sharks-aging-heart-eyes
1•digital55•2m ago•0 comments

Mobile phone short video useimpacts attention functions: an EEG study

https://www.frontiersin.org/journals/human-neuroscience/articles/10.3389/fnhum.2024.1383913/full
1•jmacd•4m ago•0 comments

Show HN: I ported Tree-sitter to Go

https://github.com/odvcencio/gotreesitter
2•odvcencio•5m ago•0 comments

Intelligence: A History

https://aeon.co/essays/on-the-dark-history-of-intelligence-as-domination
1•quijoteuniv•6m ago•0 comments

Data Scanning and the Fourth Amendment [pdf]

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5175686
1•treetalker•7m ago•0 comments

Canadian Tire data breach exposed almost 42M records

https://haveibeenpwned.com/Breach/CanadianTire
1•auslegung•8m ago•1 comments

Forking Zed to orchestrate headless coding agent fleets

https://blog.helix.ml/p/how-we-forked-zed-to-run-a-fleet
1•quesobob•8m ago•0 comments

The Slow Death of the Power User

https://fireborn.mataroa.blog/blog/the-slow-death-of-the-power-user/
1•microsoftedging•8m ago•0 comments

Woxi: Wolfram Mathematica Reimplementation in Rust

https://github.com/ad-si/Woxi
1•adamnemecek•9m ago•0 comments

My AI kept lying to me, so I built a stress test for agents

https://substack.com/home/post/p-189080713
1•aa-on-ai•9m ago•1 comments

CO2 Is the Wrong Number: Greenhouse Gas Equivalents for Road Freight

https://www.mikeayles.com/blog/co2-vs-ghg-equivalents/
1•mikeayles•10m ago•0 comments

Show HN: ATA – open-source terminal research agent for keeping up with papers

https://github.com/Agents2AgentsAI/ata
1•nimanima11•10m ago•1 comments

Three games to illustrate societal failures

https://twitter.com/rokomijic/status/2026622259595481468
1•MrBuddyCasino•11m ago•0 comments

Lambda: The Ultimate GOTO (1977)

https://research.scheme.org/lambda-papers/lambda-papers-ltu-goto.html
2•tosh•12m ago•0 comments

A tool for (Go) code clone detection

https://github.com/mibk/dupl
1•kermatt•13m ago•0 comments

Ask HN: Should you include a list of technologies in your CV?

1•oldestofsports•14m ago•0 comments

Show HN: Tentacle – Local-first note taking app that organizes itself

https://www.tentaclenote.app/
1•nicoleao•17m ago•0 comments

Show HN: I built an AI senior architect – vibe coding meets system design

https://www.sysdesai.com
1•BetterForAll•18m ago•1 comments

Disabled woman put in nursing home against her will says she feels 'betrayed'

https://www.bbc.com/news/articles/czj1ndzz9xyo
2•speckx•18m ago•0 comments

Show HN: I ported Manim to TypeScript (run 3b1B math animations in the browser)

https://github.com/maloyan/manim-web
1•maloyan•19m ago•0 comments

Fredrick Brennan, founder of 8chan, has died

https://shows.acast.com/im-from-the-internet-a-podcast-about-somethingawfulcom/episodes/the-late-...
4•flykespice•20m ago•1 comments

Hacker used Anthropic's Claude chatbot to attack government agencies in Mexico

https://www.engadget.com/ai/hacker-used-anthropics-claude-chatbot-to-attack-multiple-government-a...
3•LordAtlas•21m ago•0 comments

Ralph-code – Structured autonomous coding loop with Claude Code and Codex

https://github.com/daegwang/ralph-code
2•gwangee•21m ago•1 comments

The Appeal and Reality of Recycling LoRAs with Adaptive Merging

https://arxiv.org/abs/2602.12323
3•PaulHoule•22m ago•0 comments

A formal proof that a tax system can function without compliance decisions

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6287978
2•demyanov•23m ago•1 comments

What Makes People Proud of Their Country?

https://www.pewresearch.org/global/2026/02/17/what-makes-people-proud-of-their-country/
3•atlasunshrugged•24m ago•2 comments

Show HN: Agent that matches sales reps with warm leads based on product usage

https://inspector.getbeton.ai
3•nadyyym•24m ago•0 comments

West Virginia's Anti-Apple CSAM Lawsuit Would Help Child Predators Walk Free

https://www.techdirt.com/2026/02/25/west-virginias-anti-apple-csam-lawsuit-would-help-child-preda...
6•hn_acker•24m ago•0 comments

Respecting maintainer time should be in security policies

https://sethmlarson.dev/respecting-maintainer-time-should-be-in-security-policies
2•lumpa•25m ago•0 comments