frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

We crawled 1M domains to map AI agent permissions – 90% have no policy

https://www.maango.io/report
2•mehula•2h ago

Comments

mehula•2h ago
Hey HN - I built this.

I'm building infrastructure for AI agents and kept running into the same problem: before an agent fetches a URL, there's no easy way to know what's allowed. There are now 8 different standards - robots.txt, llms.txt, ai.txt, TDMRep, Cloudflare Content Signals, and others - all saying different things in different formats. No one checks all of them. Most agents check zero.

So I decided to actually measure the problem. I crawled the Tranco top 1M domains over 10 days in February 2026, parsing every known AI policy signal. Failure rate was 0.07% (697 domains out of 1M).

What surprised me most:

- 90% of domains have zero AI-specific signals. Not "they block everything" - they literally say nothing. Most robots.txt files just have generic /admin/ or /wp-login/ rules from a decade ago.

- When sites DO block, it's almost always a blanket decision. 58,791 domains block both GPTBot and ClaudeBot. Only 9,888 block GPTBot alone. The "nuanced policy" that regulators imagine basically doesn't exist.

- Cloudflare sites block AI at 2.3x the baseline rate. Not because their owners care more - because Cloudflare shipped a one-click toggle in July 2024. The tooling creates the behavior.

- TDMRep adoption: 37 out of 1 million. That's the W3C protocol specifically designed for the EU Copyright Directive's TDM opt-out. Caveat: our detection covers the well-known path and HTTP headers, not HTML meta tags on subpages – actual adoption among European publishers is likely higher. We note this in the methodology.

- The ToS gap is the finding I think matters most. We scanned 79K Terms of Service pages. 7,575 domains prohibit crawling or AI training in their ToS but have zero AI-specific robots.txt rules. YouTube, Discord, Substack, Target - an agent checking only robots.txt sees "no policy" while the site's legal terms explicitly say stop.

- 6,317 domains contradict themselves across standards - e.g., blocking GPTBot in robots.txt but setting search=yes in Content Signals.

This is the first public output from a project called Maango, which is building a registry and API to check any domain's AI policy across all 8 standards in one call. The report is free and the methodology is documented in full.

Happy to answer questions about the data, methodology, or the agent compliance space generally.

throwawayffffas•2h ago
I think most startups policy is "We have professional indemnity insurance that covers our use of AI agents".
mehula•2h ago
Insurance covers the lawsuit. It doesn't un-scrape the content. Haha. It's better to stay compliant from day 1

Ask HN: What Happened to Llama Models?

1•elpakal•23s ago•0 comments

Meta to Acquire Moltbook

https://www.bloomberg.com/news/articles/2026-03-10/meta-to-acquire-moltbook-viral-social-network-...
1•marc__1•44s ago•0 comments

Disorder Drives One of Nature's Most Complex Machines

https://www.quantamagazine.org/disorder-drives-one-of-natures-most-complex-machines-20260309/
2•Brajeshwar•4m ago•0 comments

Spacecraft's impact changed asteroid's orbit in a save-the-Earth test

https://apnews.com/article/asteroid-nasa-draft-dimorphos-9abccd32d4cb532a66249dd6145685cb
2•Brajeshwar•4m ago•0 comments

Volkswagen to cut 50k jobs as profits drop

https://www.bbc.com/news/articles/c4gqyyly9v8o
1•gehwartzen•5m ago•0 comments

Microsoft 365 confirms new premium tier, stuffed with AI and few discounts

https://www.theregister.com/2026/03/09/microsoft_adds_a_premium_tier/
1•Brajeshwar•5m ago•0 comments

Smol AI WorldCup: What Small LLMs Can Do

https://huggingface.co/blog/FINAL-Bench/smol-worldcup
2•seawolf2357•5m ago•0 comments

Debian decides not to decide on AI-generated contributions

https://lwn.net/SubscriberLink/1061544/125f911834966dd0/
3•jwilk•5m ago•0 comments

License Laundering and the Death of Clean Room (The Chardet Saga)

https://shiftmag.dev/license-laundering-and-the-death-of-clean-room-8528/
1•allixsenos•5m ago•0 comments

We are building data breach machines and nobody cares

https://idealloc.me/posts/we-are-building-data-breach-machines-and-nobody-cares/
2•idealloc_haris•7m ago•0 comments

Turing Award winner and former Oxford professor Tony Hoare passed away

https://blog.computationalcomplexity.org/2026/03/tony-hoare-1934-2026.html
6•speckx•8m ago•0 comments

Non-blocking SQLite for Node.js. Ported 100% of better-sqlite3 tests

https://www.npmjs.com/package/better-sqlite3-pool
1•dilipvamsi•8m ago•1 comments

AI Agent hacked McKinsey's chatbot and gained full read-write access in 2 hours

https://www.theregister.com/2026/03/09/mckinsey_ai_chatbot_hacked/
1•smurda•8m ago•0 comments

Forward to Hell?

https://labs.ripe.net/author/mkoch/forward-to-hell-on-misusing-transparent-dns-forwarders-for-amp...
2•jruohonen•9m ago•0 comments

Elements of AI Agents

https://academy.dair.ai/courses/elements-of-ai-agents
1•omarsar•9m ago•0 comments

Portable Secret is now open source

https://blog.alcazarsec.com/tech/posts/portable-secret-is-now-opensource
1•alcazar•11m ago•0 comments

Why $100 Oil Isn't Going to Spark a New Shale Boom – Oilprice.com

https://oilprice.com/Energy/Crude-Oil/Why-100-Oil-Isnt-Going-to-Spark-a-New-Shale-Boom.html
1•bilsbie•12m ago•0 comments

JSON Documents Performance, Storage and Search: MongoDB vs. PostgreSQL

https://binaryigor.com/json-documents-mongodb-vs-postgresql.html
1•PaulHoule•13m ago•0 comments

Keep the Tokens Flowing: Lessons from 16 Open-Source RL Libraries

https://huggingface.co/blog/async-rl-training-landscape
1•ibobev•14m ago•0 comments

Slatted Headboard on a Single Wooden Bed Frame: Back Support with Natural Flex

https://dreamhomestore.co.uk/collections/wooden-bed-frames
1•tonypaterson•15m ago•2 comments

Foreign-funded lobby groups outside EU are pushing ChatControl with propaganda

https://digitalcourage.social/@echo_pbreyer/116205371224315359
5•latexr•15m ago•0 comments

Show HN: HomeLore – Every home has a story. Let us tell it

https://homelore.org
1•nswizzle31•15m ago•0 comments

New Ways to Create Faster with Gemini in Docs, Sheets, Slides and Drive

https://blog.google/products-and-platforms/products/workspace/gemini-workspace-updates-march-2026/
1•meetpateltech•15m ago•0 comments

Today Is the 150th Anniversary of the First Telephone Call

https://about.att.com/story/2026/150-years-first-telephone-call.html
3•lordleft•17m ago•0 comments

Defeating Context Fatigue with Agentic Scaffolding

https://patrickmccanna.net/defeating-context-fatigue-with-agentic-scaffolding/
2•0o_MrPatrick_o0•17m ago•0 comments

Produce 1 week of content with 1 click

https://www.web2labs.com/studio
1•philippfanta•18m ago•0 comments

Intensifying global heat threatens livability for younger and older adults

https://iopscience.iop.org/article/10.1088/2752-5309/ae3c3a
10•Someone•19m ago•1 comments

Show HN: A playable version of the Claude Code Terraform destroy incident

https://www.youbrokeprod.com
2•cdnsteve•19m ago•1 comments

What are the deadliest animals and can we protect ourselves against them?

https://ourworldindata.org/deadliest-animals
2•alphabetatango•20m ago•0 comments

Meta hires duo behind Moltbook

https://www.axios.com/2026/03/10/meta-facebook-moltbook-agent-social-network
8•mmayberry•20m ago•1 comments