Cloudflare CEO Is Lying to You About the Bot Traffic Jump

https://www.flyingpenguin.com/cloudflare-ceo-is-lying-to-you-about-the-bot-traffic-jump/

125•speckx•1h ago

Comments

kordlessagain•1h ago

I concur and have been talking about this for a while.

The fact is, Cloudflare is a man-in-the-middle. That's their focus, that's their purpose.

They will limit your local crawler from accessing pages. They will demand you use their crawler.

They will decrypt your traffic if they get a warrant. They always decrypt your traffic anyway, but they will give it to state actors if they demand it.

That's not to say anyone should break the laws, but the issue right now is that intellectual property is incompatible with what is coming with AI.

I don't hate on Cloudflare because it's a bad service. It's actually pretty good, but the fundamental problem is they make their purpose to be a single choke point of all data on the Web.

That's not right. It never was.

gonzalohm•46m ago

Careful I posted something similar in another Cloudfare thread and people threw at me like lions.

They don't see anything wrong with one entity controlling most of the internet traffic

gruez•12m ago

>They will limit your local crawler from accessing pages. They will demand you use their crawler.

Source? According to cloudflare, their crawling service don't get any special treatment from their WAF/CDN.

taeric•1h ago

I confess a sad assumption that bot traffic is far higher than we have admitted for a long time. Though, maybe we would see different stats specifically to social media sights to astroturf like counts? Certainly feels that we have known for a long time that bots were larger in ad viewing than ad companies wanted to admit.

mikey_p•56m ago

Well the fun things is that no one knows how much traffic of what kind they are getting when they use Cloudflare.

You get the numbers that Cloudflare tells you, but who knows if you can trust their stats after their CEO is apparently cherry-picking data to shape their product narrative?

thewebguyd•19m ago

That same CEO too that just went on a wild tone-def layoff justification, classifying human employees into roles of either a builder, seller, or measurer and saying he wants to get rid of everyone that "measures" the business...

I wouldn't trust a single thing coming out of his mouth.

reconnecting•55m ago

I don't understand what difference bots make. For me, a website (the public part) is a storefront. People walk down the street and see what's inside — that's the purpose. If something should not be available immediately, that's the private part of the store.

I've been monitoring bot traffic on digital platforms for over 10 years. Sure, the crawler share is growing, some even with malicious intentions, and those I detect and block.

I disagree that this pain is worth the cost of making real people spend their life on verification.

taeric•

1vuio0pswjnm7•1h ago

There is a unfortunate incentive created when a "business" (MiTM) depends on "bot traffic", i.e., the continued nuisance of bot traffic, to make money

If the "bot traffic" declines, then the "bot protection business" goes down with it

Cloudflare communication are sometimes careful to refer to traffic _labeled as_ bot traffic versus actual bot traffic

Because the "business" relies on the existance of "bot traffic", theres an incentive to broaden the scope of what is labeled as "bot traffic"

The false positive rate can be high. The public should see those statistics, and in truth it may be infeasible to compile them when theres no verification and the entire system relies on heuristics

"Bot protection" can be used to gather fingerprints for marketing

It can be used to force users to use certain software, e.g., certain browsers, and to enable Javascript subjecting users to data collection, surveillance and ads

Originally the motivation for avoiding "bot traffic" was based on behaviour, e.g., exceeding acceptable rates of usage, making too many requests in a given time period, exceeding rate limits

Now it's available to exclude traffic based on criteria such as what browser someone is using. NB. This is more than "user-agent string". The company forces people to sign NDAs before telling them what it is doing to fingerprint www users

If residential proxies are the problem then why not go after the companies that provide them

The truth is that those companies are not the problem. Their customers are so-called "tech" companies

Perhaps it's these so-called "tech" companies that are the problem

Certainly the problem is not the individual www user who doesnt use an "approved" graphical, Javascript-enabled browser who gets blocked or fingerprinted trying to make a single request

But thats who suffers from "bot protection" so that so-called "tech" companies can profit from data collection, surveillance and ads

Xirdus•40m ago

> Originally "bot traffic" was based on behaviour, e.g., exceeding acceptable rates of usage, making too many requests in a given time period, exceeding rate limits

> Now it's available to exclude traffic based on criteria such as what browser someone is using

I'm pretty sure user-agent-based bot detection predates every request-rate-based method by quite a few years.

reconnecting•1h ago

Cloudflare bot detection has taught me a reflex to close the tab every time I see its logo.

giancarlostoro•54m ago

I kind of do the same, not every time, but sometimes if I keep getting it on the same site, I seriously question how accurate their stats are without deep diving them more.

thejazzman•19m ago

switching to tmobile home internet has been eye opening to me on how different the internet can be from person to person. you don't even get your own ipv4 address. makes me realize the challenge behind blocking something like yt-dlp

Groxx•17m ago

Yeah, I've seriously considered finding or building a CF-protected-detector browser extension to flag domains. Having one company MITMing so much traffic is straightforwardly dangerous, and not just an annoyance. We need competition.

reconnecting•9m ago

Why so? They're all in NS already: *.ns.cloudflare.com

mikey_p•58m ago

Do people really expect CEOs to be knowledgeable about any technically details in 2026? My experience is that CEOs are getting increasingly out of touch with what their employees actually do and what their customers want.

simonw•54m ago

"Cloudflare CEO is lying" is a bit of an aggressive take when he linked to the exact data so you can see it for yourself - and that's how this article was able to analyze it: https://radar.cloudflare.com/traffic#bot-vs-human

Update: I see the problem. Here's the full tweet: https://x.com/eastdakota/status/2062212701414187452

"Thought it would be end of 2027, then early 2027, but agentic traffic growing so fast that bots have now passed human traffic online for the first time in the Internet's history."

But the quoted segment in the article was just "…bots passed human traffic online for the first time in the Internet’s history."

It looks to me like the data supports "bots passed human traffic" but does NOT support "agentic traffic", since more of that traffic is from AI crawlers building indexes than from agents that are browsing the web on behalf of their owners.

If that's the point the article is trying to make then the headline is a little more supported, though I'd still say it's too hype-y a headline.

I guess a lot of this rests on what you assume "agentic traffic" to mean.

gonzalohm•48m ago

The graph only goes back to May 3rd 2026. I guess that was the start of humanity

JimDabell•38m ago

> It looks to me like the data supports "bots passed human traffic"

I think you are missing the fact that the dashboard has HTML pre-selected as a filter. Once you change that to all content types, you’ll see humans account for twice as much traffic as bots.

Note this part of the article:

> The CEO ignored the all-traffic number, on his own dashboard, and instead published the HTML-only number as a fact about the whole internet.

bobjordan•51m ago

The article strikes me as quite uncharitable to characterize it as "a lie". I doubt this CEO just sat down and calculated he was going to write lies. While it's fine to call out he's wrong per his own fuller data set, it's quite a different thing than calling the person out as a "liar" in a rage-bait fashion.

letmevoteplease•34m ago

The author is also not very good at interpreting data himself. He claims "the AI number is padded by counting Googlebot twice" and links to [1], but there is nothing on that page that could support that assertion. It looks like he misinterpreted this part: "Googlebot crawls for both search indexing and AI training and is included as a separate entry due to its crawl volume" (Googlebot was NOT included in the "AI bot" count, so it was not counted twice.)

[1] https://radar.cloudflare.com/year-in-review/2025

DevKoala•51m ago

Not sure if the Cloudflare CEO is lying, but I have a pixel deployed on tens of thousands of sites offering B2B solutions, and bot traffic overtook human traffic this year.

a1o•45m ago

What does pixel means in this context?

supriyo-biswas•40m ago

https://en.wikipedia.org/wiki/Web_beacon, aka "tracking pixel", though these days it probably means a JS-based analytics reporting script.

onei•35m ago

It's a tracking tool. You have a bunch of sites embed an image, and requests to those sites also make requests to said image, which you can use to start tracking a client. A single pixel is merely the cheapest image.

I recall Facebook doing it years ago, I imagine they still do.

Terretta•32m ago

https://advertising.amazon.com/resources/ad-policy/pixeling-...

A 'pixel' is an unobtrusive (as in, not seen by the user like a banner ad is seen) asset* served on a web page that can cause the user's user agent to make an affirmative web request from you, a third party, so you know someone was at the site serving your pixel.

Typically used for:

- tracking in general, as well as more specifically:

- retargeting

- conversion

* Note: Doesn't have to be a literal pixel, but a literal transparent pixel is least likely to get blocked. Serve your pixels from the end of a parameterized path (/some/param/or/other/pixel.gif) and it's not seen as query string tracking either.

thm•49m ago

Why did we start treating Cloudflare (a public, for-profit company) as the undisputed authority on anything related to the network layer of the internet in the first place?

autoexec•43m ago

Because they inserted themselves into almost everything we do online and basically managed to take control over it. Cloudflare should never have been allowed to man in the middle the entire internet, but now that they have they're the only man on earth with a dataset that size.

majke•33m ago

(I'm ex CF) This is backwards. Nobody "allowed" anyting. CF serves a customers need. You can argue with the solution but you can't argue with the core problem. It's more healthy to start the conversation of _why_ CF services are valuable.

EnergyAmy•30m ago

No, it's more healthy to start the conversation on why we allow corporations to do bad things with excuses like "just serving customer's needs"

oytis•25m ago

I don't get it, what is bad about what they are doing? People need a CDN, they choose the one they find the best.

thm•21m ago

The discussion revolves around the equivalent of taking nutrition advice from Coca-Cola's blog.

Bender•49m ago

I tested this theory not long ago and did not see anything that aligned with the hype around bots. [1] There are indeed more bots than humans because of course there are or at least the appearance of. Bots crawl everything linked from popular sites whereas humans only click on things that interest them and even then they do not typically siphon the entire site. There are new bot operators every day due to curiosity and FOMO.

The only thing I saw that could possibly be construed as abusive were some poorly configured RSS bots. Even when my server told the bot that the page would not change for 4 hours the RSS bots would check every 10 minutes meaning they are ignoring the cache-control header. This was entirely harmless, just slightly annoying. The RSS bots are not new. Most of the bots are not even trying to disguise themselves as humans.

I was expecting the bots to mirror a couple git repositories I exposed but they did not go deeper than the README.md. None of them. I think this is the same pattern of catastrophization that exists around AI dooming the world and I don't know why it is spreading. I guess it must work or people would not do it.

[1] - https://blawg.nochan.net/b/Internet-Crap/20260522-Maybe-AI-B...

tyjkot•47m ago

I love when smart people catch liars.

ChrisArchitect•47m ago

Discussion: https://news.ycombinator.com/item?id=48387144

nothrows•47m ago

Cloudflare is junk. Their entire billion dollar service can't distinguish my (DAILY) GET request to mainstream news sites from bot traffic, nothing they say or do is of any value. I've had the same IP for decades.

yubblegum•43m ago

That could be plausible deniability. I mean, CF is in fact keeping a tab on who is visiting which websites. Between them and Google, these two companies know everything about everyone.

supriyo-biswas•42m ago

It can be worse; they randomly block my uptime monitoring with 4xx and 5xx status codes once in two months or something like that, despite nothing changing.

jimrandomh•37m ago

Have you checked your IP address's reputation with a service such as ipqualityscore.com? If cloudflare thinks your traffic is bot traffic, it's likely that there is bot traffic you don't know about coming from your IP, either from a compromised device on your network or a sketchy VPN product.

ai_fry_ur_brain•18m ago

Also, it doesnt apply to this person if they've had the same IP for years but if your ISP rotates IPs frequently (mine does everytime I reset modem) or you use 5G and CGNAT is being used, its almost a garuntee that your proxy has been labled as having been used by a residential proxy network.

So many people have sketchy TV boxes or whatever other sketchy IOT decice that is a larp for using your network to sell bandwidth to proxy networks.

However, CF is unnecessarily making people on 5G connections from desktops do turnstiles as it looks like a scraper using a mobile proxy. This will become more and more of an issue as more laptops have 5G modems in them. Not sute how this WAF IP fingerprinting model survives widesprear CGNAT. I guess it will be an excuse to more intensly fingerprint us.

csomar•46m ago

The article is a bit too strong, aggressive I’d even say. Content is loaded only if the bot executes JavaScript and loads all content willingly. These do exist, but they are more expensive to run than a basic curl bot.

It’d make sense as you might not want your bot to load everything a real human would do (ie: analytics, ads, unrelated files, etc..) and only focus on the content.

Also, am I the only one surprised that bot traffic is not the majority already? For my site, it’s x100 bots for every human.

jimrandomh•39m ago

I deal with scrapers that sometimes border on DDoSes for LessWrong. The amount of bot traffic varies greatly between sites; if you have more URLs you get more bot traffic (regardless of whether those URLs represent a deep content catalog, or useless URL parameter permutations). It's bad for LW because of the content-catalog depth.

It's easy to drastically underestimate the amount of bot traffic, because bots make efforts (of varying sophistication) to look human enough to evade blocking. That includes using fake user-agent strings corresponding to real browsers (often but not always with implausibly old version numbers), proxying through residential IPs, and sometimes using full headless browsers. In my own data, traffic from badly behaved browser-impersonation bots exceeds traffic from named scrapers like GPTBot by something like 10x.

The measured percentage of bot traffic is higher for HTML than for other content types because many bots will load an HTML page, and then not load the JS/CSS/image/etc resources it references. But these are the least-sophisticated and most-detectable bots.

arjie•9m ago

Does LW have a downloadable archive? I can only find references to GreaterWrong but no public answer. Would be useful.

eli•30m ago

"Lying" is not supported by the evidence. In the context of bot traffic on the web, looking at only GETs for HTML is a reasonable approach. If you're counting all requests for all assets then a single page view of nytimes.com would count 100x as much as one for HN.

I would assume a lot of people running websites tend to think in pageviews, especially when dealing with bots because images and CSS files tend to be "cheap" static content but HTML requests are often dynamically generated.

It's also a single tweet that links to the data used to "disprove" it. Would be a weird way to lie.

wiredfool•30m ago

I run some moderate profile gov and ngo opendata sites, and I’d say that bot like traffic is 99% of the requests we’re seeing on some sites.

Mostly current valid user agents, lots of ip addresses, but the traffic patterns are not organic. I’m not clear if it’s bad ai scraping or dos, but at some level it’s indistinguishable.

reconnecting•24m ago

If you can email me, I'd be happy to volunteer some help looking into this for your org, as we've made some tool to investigate bots (open-source).

Astronauts told to return to ISS after sheltering over air leak repairs

GitHub Accidentally Deletes Slack and Teams Subscriptions

pg_durable: Microsoft open sources in-database durable execution

Gemma 4 QAT models: Optimizing compression for mobile and laptop efficiency

Cloudflare CEO Is Lying to You About the Bot Traffic Jump

Conventional Commits encourages focus on the wrong things

Mouseless – keyboard-driven control of macOS/Linux/Windows

New method turns ocean water into drinking water, without waste

My Agent Skill for Test-Driven Development

Transformers Are Inherently Succinct

Gov.uk has replaced Stripe with Dutch provider Adyen

I tested every IP KVM in my Homelab

Did Claude increase bugs in rsync?

Launch HN: General Instinct (YC P26) – Frontier models on edge devices

Mantine-datatable (and others) compromised – owner account suspended

Cooldown Support for Ruby Bundler

"Maybe later" was a feature

Do the Hardest Thing

Inside FAISS: Billion-Scale Similarity Search

Sakana AI's Recursive Self-Improvement (RSI) Lab

Nango (YC W23, dev infra) is hiring staff back end engineers

Tracing a powerful GNSS interference source over Europe

Ask HN: What was your "oh shit" moment with GenAI?

Redis 8.8: New array data structure, rate limiter, performance improvements

Dutch gov't will only allow European company to operate DigiD platform

Three of our worst VC stories

India's surprise baby bust

C++: The Documentary

Show HN: Lowfat – pluggable CLI filter that saved 91.8% of my LLM tokens

Entanglement Builds Space-Time. Now "Magic" Gives It Gravity

Astronauts told to return to ISS after sheltering over air leak repairs

GitHub Accidentally Deletes Slack and Teams Subscriptions

pg_durable: Microsoft open sources in-database durable execution

Gemma 4 QAT models: Optimizing compression for mobile and laptop efficiency

Cloudflare CEO Is Lying to You About the Bot Traffic Jump

Conventional Commits encourages focus on the wrong things

Mouseless – keyboard-driven control of macOS/Linux/Windows

New method turns ocean water into drinking water, without waste

My Agent Skill for Test-Driven Development

Transformers Are Inherently Succinct

Gov.uk has replaced Stripe with Dutch provider Adyen

I tested every IP KVM in my Homelab

Did Claude increase bugs in rsync?

Launch HN: General Instinct (YC P26) – Frontier models on edge devices

Mantine-datatable (and others) compromised – owner account suspended

Cooldown Support for Ruby Bundler

"Maybe later" was a feature

Do the Hardest Thing

Inside FAISS: Billion-Scale Similarity Search

Sakana AI's Recursive Self-Improvement (RSI) Lab

Nango (YC W23, dev infra) is hiring staff back end engineers

Tracing a powerful GNSS interference source over Europe

Ask HN: What was your "oh shit" moment with GenAI?

Redis 8.8: New array data structure, rate limiter, performance improvements

Dutch gov't will only allow European company to operate DigiD platform

Three of our worst VC stories

India's surprise baby bust

C++: The Documentary

Show HN: Lowfat – pluggable CLI filter that saved 91.8% of my LLM tokens

Entanglement Builds Space-Time. Now "Magic" Gives It Gravity

Cloudflare CEO Is Lying to You About the Bot Traffic Jump

Comments