frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Cloudflare CEO Is Lying to You About the Bot Traffic Jump

https://www.flyingpenguin.com/cloudflare-ceo-is-lying-to-you-about-the-bot-traffic-jump/
57•speckx•1h ago

Comments

kordlessagain•1h ago
I concur and have been talking about this for a while.

The fact is, Cloudflare is a man-in-the-middle. That's their focus, that's their purpose.

They will limit your local crawler from accessing pages. They will demand you use their crawler.

They will decrypt your traffic if they get a warrant. They always decrypt your traffic anyway, but they will give it to state actors if they demand it.

That's not to say anyone should break the laws, but the issue right now is that intellectual property is incompatible with what is coming with AI.

I don't hate on Cloudflare because it's a bad service. It's actually pretty good, but the fundamental problem is they make their purpose to be a single choke point of all data on the Web.

That's not right. It never was.

gonzalohm•9m ago
Careful I posted something similar in another Cloudfare thread and people threw at me like lions.

They don't see anything wrong with one entity controlling most of the internet traffic

taeric•1h ago
I confess a sad assumption that bot traffic is far higher than we have admitted for a long time. Though, maybe we would see different stats specifically to social media sights to astroturf like counts? Certainly feels that we have known for a long time that bots were larger in ad viewing than ad companies wanted to admit.
mikey_p•19m ago
Well the fun things is that no one knows how much traffic of what kind they are getting when they use Cloudflare.

You get the numbers that Cloudflare tells you, but who knows if you can trust their stats after their CEO is apparently cherry-picking data to shape their product narrative?

reconnecting•18m ago
I don't understand what difference bots make. For me, a website (the public part) is a storefront. People walk down the street and see what's inside — that's the purpose. If something should not be available immediately, that's the private part of the store.

I've been monitoring bot traffic on digital platforms for over 10 years. Sure, the crawler share is growing, some even with malicious intentions, and those I detect and block.

I disagree that this pain is worth the cost of making real people spend their life on verification.

taeric•11m ago
For ad views, the concern is specifically that people pay for clicks and views. That that can be so heavily influenced by bot traffic greatly undermines their value.

Same general idea goes for any of the algorithmic driven platforms. The algorithms are ostensibly intended to surface organically discovered things by watching how people interact with things. That they are so susceptible to distortion through bot farms should be a lot more acknowledged than it is. People trust them far more than they should.

There is also a general cost of running things concern. It isn't like it is completely free to execute on bot traffic.

1vuio0pswjnm7•1h ago
There is a unfortunate incentive created when a "business" (MiTM) depends on "bot traffic", i.e., the continued nuisance of bot traffic, to make money

If the "bot traffic" declines, then the "bot protection business" goes down with it

Cloudflare communication are sometimes careful to refer to traffic _labeled as_ bot traffic versus actual bot traffic

Because the "business" relies on the existance of "bot traffic", theres an incentive to broaden the scope of what is labeled as "bot traffic"

The false positive rate can be high. The public should see those statistics, and in truth it may be infeasible to compile them when theres no verification and the entire system relies on heuristics

"Bot protection" can be used to gather fingerprints for marketing

It can be used to force users to use certain software, e.g., certain browsers, and to enable Javascript subjecting users to data collection, surveillance and ads

Originally the motivation for "bot traffic" was based on behaviour, e.g., exceeding acceptable rates of usage, making too many requests in a given time period, exceeding rate limits

Now it's available to exclude traffic based on criteria such as what browser someone is using

If residential proxies are the problem then why not go after the companies that provide them

The truth is that those companies are not the problem. Their customers are so-called "tech" companies

Perhaps it's these so-called "tech" companies that are the problem

Certainly the problem is not the individual www user who doesnt use an "approved" graphical, Javascript-enabled browser who gets blocked or fingerprinted trying to make a single request

But thats who suffers from "bot protection" so that so-called "tech" companies can profit from data collection, surveillance and ads

Xirdus•3m ago
> Originally "bot traffic" was based on behaviour, e.g., exceeding acceptable rates of usage, making too many requests in a given time period, exceeding rate limits

> Now it's available to exclude traffic based on criteria such as what browser someone is using

I'm pretty sure user-agent-based bot detection predates every request-rate-based method by quite a few years.

reconnecting•26m ago
Cloudflare bot detection has taught me a reflex to close the tab every time I see its logo.
giancarlostoro•17m ago
I kind of do the same, not every time, but sometimes if I keep getting it on the same site, I seriously question how accurate their stats are without deep diving them more.
mikey_p•21m ago
Do people really expect CEOs to be knowledgeable about any technically details in 2026? My experience is that CEOs are getting increasingly out of touch with what their employees actually do and what their customers want.
simonw•17m ago
"Cloudflare CEO is lying" is a bit of an aggressive take when he linked to the exact data so you can see it for yourself - and that's how this article was able to analyze it: https://radar.cloudflare.com/traffic#bot-vs-human

Update: I see the problem. Here's the full tweet: https://x.com/eastdakota/status/2062212701414187452

"Thought it would be end of 2027, then early 2027, but agentic traffic growing so fast that bots have now passed human traffic online for the first time in the Internet's history."

But the quoted segment in the article was just "…bots passed human traffic online for the first time in the Internet’s history."

It looks to me like the data supports "bots passed human traffic" but does NOT support "agentic traffic", since more of that traffic is from AI crawlers building indexes than from agents that are browsing the web on behalf of their owners.

If that's the point the article is trying to make then the headline is a little more supported, though I'd still say it's too hype-y a headline.

I guess a lot of this rests on what you assume "agentic traffic" to mean.

gonzalohm•11m ago
The graph only goes back to May 3rd 2026. I guess that was the start of humanity
bobjordan•14m ago
The article strikes me as quite uncharitable to characterize it as "a lie". I doubt this CEO just sat down and calculated he was going to write lies. While it's fine to call out he's wrong per his own fuller data set, it's quite a different thing than calling the person out as a "liar" in a rage-bait fashion.
DevKoala•14m ago
Not sure if the Cloudflare CEO is lying, but I have a pixel deployed on tens of thousands of sites offering B2B solutions, and bot traffic overtook human traffic this year.
a1o•8m ago
What does pixel means in this context?
supriyo-biswas•3m ago
https://en.wikipedia.org/wiki/Web_beacon, aka "tracking pixel", though these days it probably means a JS-based analytics reporting script.
thm•12m ago
Why did we start treating Cloudflare (a public, for-profit company) as the undisputed authority on anything related to the network layer of the internet in the first place?
autoexec•6m ago
Because they inserted themselves into almost everything we do online and basically managed to take control over it. Cloudflare should never have been allowed to man in the middle the entire internet, but now that they have they're the only man on earth with a dataset that size.
Bender•11m ago
I tested this theory not long ago and did not see anything that aligned with the hype around bots. [1] There are indeed more bots than humans because of course there are or at least the appearance of. Bots crawl everything linked from popular sites whereas humans only click on things that interest them and even then they do not typically siphon the entire site. There are new bot operators every day due to curiosity and FOMO.

The only thing I saw that could possibly be construed as abusive were some poorly configured RSS bots. Even when my server told the bot that the page would not change for 4 hours the RSS bots would check every 10 minutes. This was entirely harmless, just slightly annoying. The RSS bots are not new. Most of the bots are not even trying to disguise themselves as humans.

I was expecting the bots to mirror a couple git repositories I exposed but they did not go deeper than the README.md. None of them. I think it's the same pattern of catastrophization that exists around AI and I don't know why it is spreading. I guess it must work or people would not do it.

[1] - https://blawg.nochan.net/b/Internet-Crap/20260522-Maybe-AI-B...

tyjkot•10m ago
I love when smart people catch liars.
ChrisArchitect•10m ago
Discussion: https://news.ycombinator.com/item?id=48387144
nothrows•10m ago
Cloudflare is junk. Their entire billion dollar service can't distinguish my (DAILY) GET request to mainstream news sites from bot traffic, nothing they say or do is of any value. I've had the same IP for decades.
yubblegum•6m ago
That could be plausible deniability. I mean, CF is in fact keeping a tab on who is visiting which websites. Between them and Google, these two companies know everything about everyone.
supriyo-biswas•5m ago
It can be worse; they randomly block my uptime monitoring with 4xx and 5xx status codes once in two months or something like that, despite nothing changing.
csomar•9m ago
The article is a bit too strong, aggressive I’d even say. Content is loaded only if the bot executes JavaScript and loads all content willingly. These do exist, but they are more expensive to run than a basic curl bot.

It’d make sense as you might not want your bot to load everything a real human would do (ie: analytics, ads, unrelated files, etc..) and only focus on the content.

Also, am I the only one surprised that bot traffic is not the majority already? For my site, it’s x100 bots for every human.

jimrandomh•2m ago
I deal with scrapers that sometimes border on DDoSes for LessWrong. The amount of bot traffic varies greatly between sites; if you have more URLs you get more bot traffic (regardless of whether those URLs represent a deep content catalog, or useless URL parameter permutations). It's bad for LW because of the content-catalog depth.

It's easy to drastically underestimate the amount of bot traffic, because bots make efforts (of varying sophistication) to look human enough to evade blocking. That includes using fake user-agent strings corresponding to real browsers (often but not always with implausibly old version numbers), proxying through residential IPs, and sometimes using full headless browsers. In my own data, traffic from badly behaved browser-impersonation bots exceeds traffic from named scrapers like GPTBot by something like 10x.

The measured percentage of bot traffic is higher for HTML than for other content types because many bots will load an HTML page, and then not load the JS/CSS/image/etc resources it references. But these are the least-sophisticated and most-detectable bots.

Nvidia DGX Spark GB10 – AI Models and Guide with vLLM and Autonomous Script

https://github.com/omnia-projetcs/spark-dgx
1•nico248•1m ago•0 comments

What people don't get about safety at Anthropic

https://twitter.com/kevins8/status/2062969935379513431
1•kevinatac•3m ago•0 comments

How Elon Musk Killed Hundreds of Thousands of People

https://www.currentaffairs.org/news/how-elon-musk-killed-hundreds-of-thousands-of-people
1•tastyface•3m ago•0 comments

S&P 500 rejects SpaceX, also blocking entry for OpenAI and Anthropic

https://arstechnica.com/tech-policy/2026/06/sp-500-blocks-fast-spacex-entry-wont-waive-rule-for-u...
1•AndrewDucker•6m ago•0 comments

How to Stop Shipping Low-Quality RL Environments (With Examples)

https://www.latent.space/p/bad-envs
1•swyx•8m ago•0 comments

Show HN: Busbar – every LLM behind one URL, in a single Rust binary

https://github.com/MattJackson/busbarAI
1•mattjackson86•9m ago•0 comments

UK orders Google to allow publishers to opt out of AI scraping

https://apnews.com/article/google-britain-ai-competition-regulation-ce2016a4519fbe234799e009bac8f120
3•1vuio0pswjnm7•10m ago•0 comments

The effects of foods on LDL cholesterol levels

https://www.sciencedirect.com/science/article/pii/S0939475321000028
1•brandonb•11m ago•0 comments

I Put ChatGPT Browser Inside My Terminal [video]

https://www.youtube.com/watch?v=YErIWOPytuc
1•tomerbd•12m ago•0 comments

The Wrath of the Killdozer (2009)

https://www.damninteresting.com/the-wrath-of-the-killdozer/
1•bookofjoe•13m ago•0 comments

Data Centers Have a New Adversary: Tigers and Leopards at a Zoo

https://www.bloomberg.com/news/articles/2026-06-05/data-centers-have-a-new-adversary-tigers-and-l...
1•1vuio0pswjnm7•14m ago•0 comments

Amazon Employees Show Up to City Council Meetings, Demand Limits on Data Centers

https://www.wired.com/story/amazon-employees-publicly-demand-regulations-on-data-centers/
4•1vuio0pswjnm7•16m ago•0 comments

We Built Plainform and What It Means for Your Next Project

https://plainform.dev
1•eradon•16m ago•0 comments

Transformers Are Inherently Succinct

https://openreview.net/pdf?id=Yxz92UuPLQ
1•brandonb•17m ago•1 comments

Jax Back Ends and Devices

https://www.gilesthomas.com/2026/06/jax-backends-and-devices
1•gpjt•17m ago•0 comments

Tech sovereignty package to strengthen Europe's digital autonomy and resilience

https://ec.europa.eu/commission/presscorner/home/en
2•andrewstetsenko•17m ago•0 comments

Show HN: SupXML, modern memory-safe XML parser replacement for libxml2

https://supso.org/projects/sup-xml/docs
1•jrpt•18m ago•0 comments

Against an Increasingly User-Hostile Web (2017)

https://neustadt.fr/essays/against-a-user-hostile-web/
3•arunc•22m ago•0 comments

Pasteur, a zero-knowledge pastebin as an unikernel in OCaml

https://github.com/dinosaure/pasteur
2•dinosaure•25m ago•0 comments

Employees aren't resisting AI – they're resisting fear

https://www.fastcompany.com/91541703/employees-arent-resisting-ai-theyre-resisting-fear-ai-employ...
1•berlianta•26m ago•0 comments

OpenClaw Got Safer in Public

https://openclaw.ai/blog/openclaw-security-in-public
1•cryptoking1106•27m ago•0 comments

Digital Dead Man's Switch for Your Files

https://trustbourne.com/
1•BerislavLopac•27m ago•0 comments

What is my IP address?

https://ip.hny.io
1•astrochicken•29m ago•0 comments

Show HN: Lazarus, a coding agent for long-horizon tasks

https://github.com/ExpressGradient/lazarus
1•Sai_Praneeth•30m ago•0 comments

Are Memories Transferable – Or Edible?

https://www.quantamagazine.org/are-memories-transferable-or-edible-20260605/
2•kiwicopple•30m ago•0 comments

AI enthusiasts race against time, AI skeptics race against entropy

https://charity.wtf/2026/06/02/ai-enthusiasts-are-in-a-race-against-time-ai-skeptics-are-in-a-rac...
2•BerislavLopac•31m ago•1 comments

Why Can't California Count?

https://www.natesilver.net/p/why-cant-california-count
2•7777777phil•31m ago•0 comments

Neocities domain suspended by Namecheap for unrelated court case

https://bsky.app/profile/neocities.org/post/3mnkqgxostk2k
9•ScrapBlox•32m ago•0 comments

The Fitbit Air is a good wearable weighed down by a chatty AI "coach"

https://arstechnica.com/gadgets/2026/06/the-fitbit-air-is-great-but-googles-ai-is-too-nice-to-be-...
2•canucker2016•33m ago•0 comments

Assessing the Effect of a Deep-Rooted Grass on Belowground Carbon Storage

https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2025EF007102
1•PaulHoule•34m ago•0 comments