This is a crucial conversation to start. As someone building in the AI/SaaS space, the current "HTML scraping" layer is the single most brittle and unreliable part of any agentic workflow. An agent breaks the moment a developer changes a CSS class.
A common protocol—like a robots.txt but for AI agents—feels like the inevitable and necessary next step. We need a way for sites to "semantically" declare their functions and content to a machine.
This raises a huge game-theory question, though: What is the website's incentive to adopt this?
It's a "cooperate or defect" dilemma. If a site doesn't cooperate, the AI agent will just scrape it (badly). If it does cooperate, it makes it easier for the AI agent to summarize its content and potentially bypass its ad/conversion funnels.
I'm curious what the authors think the "win-win" is here for the websites themselves.
vinibrito•33m ago
If I want AI to recommend my service or product I would gladly serve some JSON it can swallow.
No need for new protocol or anything like it, just a convention, as you mentioned, similar to robot.txt.
cairnechou•45m ago
A common protocol—like a robots.txt but for AI agents—feels like the inevitable and necessary next step. We need a way for sites to "semantically" declare their functions and content to a machine.
This raises a huge game-theory question, though: What is the website's incentive to adopt this?
It's a "cooperate or defect" dilemma. If a site doesn't cooperate, the AI agent will just scrape it (badly). If it does cooperate, it makes it easier for the AI agent to summarize its content and potentially bypass its ad/conversion funnels.
I'm curious what the authors think the "win-win" is here for the websites themselves.
vinibrito•33m ago
No need for new protocol or anything like it, just a convention, as you mentioned, similar to robot.txt.
Perhaps aidata.json.