You get the numbers that Cloudflare tells you, but who knows if you can trust their stats after their CEO is apparently cherry-picking data to shape their product narrative?
I've been monitoring bot traffic on digital platforms for over 10 years. Sure, the crawler share is growing, some even with malicious intentions, and those I detect and block.
I disagree that this pain is worth the cost of making real people spend their life on verification.
Same general idea goes for any of the algorithmic driven platforms. The algorithms are ostensibly intended to surface organically discovered things by watching how people interact with things. That they are so susceptible to distortion through bot farms should be a lot more acknowledged than it is. People trust them far more than they should.
There is also a general cost of running things concern. It isn't like it is completely free to execute on bot traffic.
If the "bot traffic" declines, then the "bot protection business" goes down with it
Cloudflare communication are sometimes careful to refer to traffic _labeled as_ bot traffic versus actual bot traffic
Because the "business" relies on the existance of "bot traffic", theres an incentive to broaden the scope of what is labeled as "bot traffic"
The false positive rate can be high. The public should see those statistics, and in truth it may be infeasible to compile them when theres no verification and the entire system relies on heuristics
"Bot protection" can be used to gather fingerprints for marketing
It can be used to force users to use certain software, e.g., certain browsers, and to enable Javascript subjecting users to data collection, surveillance and ads
Originally the motivation for "bot traffic" was based on behaviour, e.g., exceeding acceptable rates of usage, making too many requests in a given time period, exceeding rate limits
Now it's available to exclude traffic based on criteria such as what browser someone is using
If residential proxies are the problem then why not go after the companies that provide them
The truth is that those companies are not the problem. Their customers are so-called "tech" companies
Perhaps it's these so-called "tech" companies that are the problem
Certainly the problem is not the individual www user who doesnt use an "approved" graphical, Javascript-enabled browser who gets blocked or fingerprinted trying to make a single request
But thats who suffers from "bot protection" so that so-called "tech" companies can profit from data collection, surveillance and ads
> Now it's available to exclude traffic based on criteria such as what browser someone is using
I'm pretty sure user-agent-based bot detection predates every request-rate-based method by quite a few years.
Update: I see the problem. Here's the full tweet: https://x.com/eastdakota/status/2062212701414187452
"Thought it would be end of 2027, then early 2027, but agentic traffic growing so fast that bots have now passed human traffic online for the first time in the Internet's history."
But the quoted segment in the article was just "…bots passed human traffic online for the first time in the Internet’s history."
It looks to me like the data supports "bots passed human traffic" but does NOT support "agentic traffic", since more of that traffic is from AI crawlers building indexes than from agents that are browsing the web on behalf of their owners.
If that's the point the article is trying to make then the headline is a little more supported, though I'd still say it's too hype-y a headline.
I guess a lot of this rests on what you assume "agentic traffic" to mean.
The only thing I saw that could possibly be construed as abusive were some poorly configured RSS bots. Even when my server told the bot that the page would not change for 4 hours the RSS bots would check every 10 minutes. This was entirely harmless, just slightly annoying. The RSS bots are not new. Most of the bots are not even trying to disguise themselves as humans.
I was expecting the bots to mirror a couple git repositories I exposed but they did not go deeper than the README.md. None of them. I think it's the same pattern of catastrophization that exists around AI and I don't know why it is spreading. I guess it must work or people would not do it.
[1] - https://blawg.nochan.net/b/Internet-Crap/20260522-Maybe-AI-B...
It’d make sense as you might not want your bot to load everything a real human would do (ie: analytics, ads, unrelated files, etc..) and only focus on the content.
Also, am I the only one surprised that bot traffic is not the majority already? For my site, it’s x100 bots for every human.
It's easy to drastically underestimate the amount of bot traffic, because bots make efforts (of varying sophistication) to look human enough to evade blocking. That includes using fake user-agent strings corresponding to real browsers (often but not always with implausibly old version numbers), proxying through residential IPs, and sometimes using full headless browsers. In my own data, traffic from badly behaved browser-impersonation bots exceeds traffic from named scrapers like GPTBot by something like 10x.
The measured percentage of bot traffic is higher for HTML than for other content types because many bots will load an HTML page, and then not load the JS/CSS/image/etc resources it references. But these are the least-sophisticated and most-detectable bots.
kordlessagain•1h ago
The fact is, Cloudflare is a man-in-the-middle. That's their focus, that's their purpose.
They will limit your local crawler from accessing pages. They will demand you use their crawler.
They will decrypt your traffic if they get a warrant. They always decrypt your traffic anyway, but they will give it to state actors if they demand it.
That's not to say anyone should break the laws, but the issue right now is that intellectual property is incompatible with what is coming with AI.
I don't hate on Cloudflare because it's a bad service. It's actually pretty good, but the fundamental problem is they make their purpose to be a single choke point of all data on the Web.
That's not right. It never was.
gonzalohm•9m ago
They don't see anything wrong with one entity controlling most of the internet traffic