I wonder if Perplexity or others mix the traffic of the two types so they’re indistinguishable, specifically to make this argument.
Or are they just so bad at writing that their own style looks like it?
where's the front page CF callout for google search agent? they wouldn't dare. i don't remember the shaming for ad and newsletter pop up blockers.
that being said, agree with you that sites are not being used the way they were intended. i think this is part of the evolution of the web. it all began with no monetization, then swung far too much into it to the point of abuse. and now legitimate content creators are stuck in the middle.
what i disagree on is that CF has the right to, again allegedly, shame perplexity on false information. especially when OAI is solving captchas and google is also "misusing" websites.
i wish i had an answer to how we can evolve the web sustainably. my main gripe is the shaming and virtue signaling.
(as an aside, not to shift the goalpost to the elephant in the room, but i didn't see any blog posts on the shameless consumption of every single thing on the internet by OAI, google and anthropic. talk about misuse..)
Without advertising the web would be largely unsupportable financially without per site subscriptions.
Perplexity is using stealth, undeclared crawlers to evade no-crawl directives
Strawmen. They aren't arguing that any automated tool should be suspect. They are arguing that an automated tool with sufficient computing power should be suspect. By Perplexity's reasoning, I should be able to set up a huge server farm and hit any website with 1,000,000 requests per second because 1 request is not seen as harmful. In this case, of course, the danger with AI is not a DOS attack but an attack against the way the internet is structured and the way website are supposed to work.
> This overblocking hurts everyone. Consider someone using AI to research medical conditions,
Of course you will put medical conditions in there: appeal to the hypothetical person with a medical problem, a rather contemptible and revolting argument.
> This undermines user choice
What happens to user choice when website designers stop making websites or writing for websites because the lack of direct interaction makes it no longer worthwile?
> An AI assistant works just like a human assistant.
That's like saying a Ferarri works like someone walking. Yes, they go from A to B, but the Ferarri can go 400km down a highway much faster than a human. So, no, it has fundamental speed and power differences that change the way the ecosystem works, and you can't ignore the ecosystem.
> This controversy reveals that Cloudflare's systems are fundamentally inadequate for distinguishing between legitimate AI assistants and actual threats.
As a website designer and writer, I consider all AI assistants to be actual threats, along with the entirety of Perplexity and all AI companies. And I'm not the only one: many content creators feel the same and hope your AI assistants are neutralized with as much extreme prejudice as possible.
That's a slippery slope all the was to absurd. They're not talking about millions of requests a second. They're talking about a browsing session (few page views) as a result of user's action. It's not even additional traffic and there's no extra concurrency - it's likely the same requests a user would make just with shorter delay.
My statement was meant as an analogy. I'm not saying an argument against Perplexity and agents is about requests per second. I'm saying there's an analogous argument: that the power of AI to transform the browsing experience is akin to the power of a server farm and thus a net negative. Therefore, your interpretation of what I was saying is wrong.
Zooming out for a second, we might be in an analogous era to open email relays. In a few years, will you need to run an agent through a big service provider because other big service providers only trust each other?
Website owners have a right to block both if they wish. Isn't it obvious that bypassing a bot block is a violation of the owners right to decide whom to admit?
Perplexity's almost seems to believe that "robots.txt was only made for scraping bots, so if our bot is not scraping, it's fair for us to ignore it and bypass the enforcement". And their core business is a bot, so they really should have known better.
To me this invalidates their whole claim that Cloudflare fails to tell the difference between scraper and user-driven agent. Instead, distinguishing them is trivial, and the block is intentional.
But it also feels like essentially "pirating" the webpages while erasing their brand. Maybe it's even a tolerable transitive situation, but you can't even argue it's beneficial in the same way as game piracy could be according to some. In the long term, we need an incentive for the content creators to willingly allow such processing. Otherwise, a lot of high quality content will eventually become members-only with DRM-like anti agent protections.
The incentive doesn't have to be monetary. I could for example imagine some website owners allow AI agents that commit to upfront verbatim repeating some sort of mandatory headers/messages/acknowledgements from the content authors, before copying or summarizing, and are known to stick to this commitment.
You can also bypass the problem already now by accessing and copying the content manually, and then putting it in the context of a tool like NotebookLM. Nobody's hurt, because you have actually seen the source by yourself, and that's all the website owners can reasonably demand.
TL;DR: why even post quality content in open if the audience won't see your ads, your donation button, or even your name. What do you think?
I partially agree with this. Yes, some incentive is OK, for some cases. I wouldn't be OK with a mandatory header/message for example showing up in my output, unless there's some very direct relevance to the content. But there could be some kind of tipper markup/code embedded in the site metadata that my agent abstracts away as content rating feedback options, and tips automatically made on my behalf if I have it configured and selected the "useful" option. Of course source citation should also be a mandatory part of the output, for that branding and also in case there's desire to go beyond the output.
However, there will also always be content authors out there who share quality content freely with no expectation of any kind of return. The "problem" is that such content usually isn't SEO-optimized, and so likely won't be in the top results. There will be little lost if those optimizing for return start blocking their content as they'll also be automatically deranked, by virtue of content access issues, and the non-optimized content will then rise to the surface.
TL;DR: suggested configurable creator-tipping system abstracted behind feedback options, and the likely case that those who block access will be deranked in favor of those maintaining open access.
There is only a violation if the bot finds a way around a login block. Same for human. But whatever is on the public web is... public. For all.
A web server providing a response to your request is akin to a restaurant server doing the same. Except for specific situations related to civil rights, they are free to not deal with you for any reason.
Hmm maybe a civil case could be potentially made here too, re disability. By blocking LLM use, sites are reducing the ability of select users to reasonably interact with the content. Just could become a thing in a few years if this nonsense continues.
Perplexity's value proposition appears to be "we're going to take the stuff off your website, and present it to our users. We're not going to show them your ads, we're not going to offer them your premium services or referrals to other products, we're going to strip out the value from your content and take it for our users".
You can argue all you want about whether that's 5k impressions a day or 1m impressions a day. It should be 0 impressions a day. It is literally just free-riding.
Also, they're meant to be a professional company taking VC money to build a business, why are writing whiny posts like a teenager? The impression I get with a lot of these companies is that their business is losing money hand over fist, they have no idea how they're going to make it work and they look absolutely panicked as a result. They come across like a company I would want to be nowhere near.
The problem I see for chatgpt/perplexity and the like is this: for good responses to many questions, they have to index the web real-time. ie, they become a search-engine. However, they cannot share revenue with the content-providers since they dont have an ad-model. I wonder how this would be resolved - perhaps thru content licensing with the large publishers.
This, exactly this is a primary reason why I use Perplexity. I want the valued content, without the unnecessary distractions that I'll never consciously touch anyway (there have been accidental clicks now and then, because some site designers really want people to click that ad and go all out to embed it into the content, and it only leads to great annoyance and sometimes a promise never to visit that site again).
minimaxir•6mo ago
Tokumei-no-hito•6mo ago
i guess it will come down to browserbase corroborating the claims.
anon7000•6mo ago
Tokumei-no-hito•6mo ago
i also tend to agree with the concept that scraping != consuming on behalf of a user. they explicitly point out that they do not store the data for training, which would fall under scraping by proxy.
posperson•6mo ago