I analyzed 200 e-commerce sites and found 73% of their traffic is fake

https://joindatacops.com/resources/how-73-of-your-e-commerce-visitors-could-be-fake

102•simul007•2h ago

Comments

simul007•2h ago

Hi HN. I run a marketing agency and fell down this rabbit hole after a client's analytics made no sense (50k visitors, 47 sales). I ended up building a simple script to track user behavior and analyzed 200+ small e-commerce sites. The average was 73% bot traffic that standard analytics counts as real.

The bots are getting creepily good at mimicking engagement. I wrote up my findings, including some of the bizarre patterns I saw and the off-the-record conversations I had with ad tech insiders. It seems like a massive, open secret that nobody wants to talk about because the whole system is propped up by it.

I'm curious if other developers, founders, or marketers here have seen similar discrepancies in their own data.

PaulHoule•1h ago

“All clicks are click fraud” isn’t that far from the truth.

yellow_lead•1h ago

This is so common, I find it strange to see a whole article framing it as a shocking phenomenon. Like, have you heard of PostHog?

urbandw311er•1h ago

What do you mean re Posthog?

rightbyte•1h ago

What do you think is the false negative ratio of the remaining 27%? A 0.1% total conversion rate would imply 1/270 visitors buying stuff and I am quite certain I buy stuff in maybe every 10th online store I visit or something.

criddell•1h ago

Does it really matter if it's all fraud? You track 47 sales over some period. What was the ad spend for that period? Combine that with previous data and that should be enough to figure out if it was a successful campaign or not.

When a company puts up a billboard or an ad on the bus, they don't care if the ad is seen by dashcams and dogs. All that matters is impact on the bottom line.

nemomarx•57m ago

If you could pay for web ads in the same way you could pay for a billboard (flat rate per period of time) yeah that would help. if you pay per impression or view or click you have some other issues

processing•54m ago

yes, the co is likely spending big money on retargeting to bring back a potential customer to sell what the bot clicked on to generate retargeting. it's fraud and costing the company money.

boplicity•46m ago

It makes optimizing your ads significantly harder. Imagine trying to understand traffic flow on a freeway when 99.9% of cars are just projected illusions. Not fun.

Effective advertising depends on iterative testing, which is very hard if the signal to noise ratio is way off.

paulcole•42m ago

> Does it really matter if it's all fraud

Uh, yes.

If you get 47 sales on $10k in ad spend (pay per click) and $9900 of that $10k was fraudulent then you got 47 sales on $100 of ad spend. Imagine if you could stop those fraudulent clicks.

dangus•26m ago

The point is, if $10k brings in 47 sales and that’s not enough then you stop buying those ads. It doesn’t really matter why it’s not working. You take your marketing spend elsewhere.

You can’t stop fraudulent clicks just like you can’t stop your SuperBowl ad from playing while your viewers are in the bathroom. How much of ESPN’s viewership happens at bars where nobody is watching?

At some point it’s not reasonable to expect ad networks to be able to stop sophisticated bots or exclude them from your billed impressions.

They should definitely try to minimize it if they want to maintain the value of their impressions but I think there is a good argument that OP just isn’t the right customer for this type of advertisement.

If you’re trying to sell a t-shirt you don’t hire a salesperson to cold call people, maybe OP shouldn’t be using web ads in the first place. If fraud was cut down by half would their situation really be that much better?

whistle650•4m ago

And imagine how much more you’d have to pay for each of those clicks if everyone could stop those fraudulent clicks. In equilibrium it shouldn’t change the total ad spend.

dangus•31m ago

I was about to say something similar to this.

You aren’t paying for conversion rate, you are paying for a link being put on a website when a query is made. You can’t control whether a bot follows that link. You can’t control how sophisticated that bot is. You can’t expect an advertiser to filter out every type of illegitimate traffic (although it sounds like they probably have the capability to filter out more but don’t have any incentive to do so).

I have seen recommendations from across the Internet to not bother with Google ads and other similar paid ad services. It’s basically like paying for a cold lead, you’re attracting one of the least interested types of customers.

The recommendation I’ve always seen is that it’s better to build legitimate interest in your product by producing content. Or perhaps move to an advertising platform where there’s more of a guarantee of reaching human users.

But still, I’ve heard that trying to spend customer acquisition dollars on one-time purchases is a losing battle.

If Tesla was able to start a massive car company without buying ads you can go without AdWords, too.

whistle650•27m ago

This is the key point. Ads and clicks etc are priced in a competitive market. If they don’t deliver the ROI because of bots, then people (including the allegedly hopelessly confused e-commerce retailers) would pay less for the same amount of traffic. It may be annoying (and the cost of dealing with that annoyance would further drive down the price paid for the traffic). But what matters is that an e-commerce site is profitable (enough) after the ad spend, period. If they are not, why do they spend what they spend on the ads?

zurfer•16m ago

Yes it matters because Google ad spend is just one way to market and it's harder to attribute sales if there is a lot of fraud making it more inefficient.

seviu•12m ago

I once worked for the yellow pages in Switzerland. Our paid clients had a dashboard which reported how many users visited their business entry.

We at engineering decided to filter out bots. Figures fell dramatically by more than 50%.

In less that a day business mandated us to remove the filter.

Bots are real people after all

bodantogat•1h ago

Not really surprised. I spend a ridiculous amount on time banning bots every week.

cantor_S_drug•1h ago

doesn't cloudflare have block bot traffic out of the box?

curiousObject•1h ago

Is there any incentive for a company to remove fake traffic[0] from its stats and analytics?

I guess there is no incentive in most markets. Facebook, etc make only a token effort to reject non-troublesome bot traffic.

[0] bots and other automated traffic which cannot generate revenue or human ad views

nemomarx•1h ago

In theory if you have a lot of fake traffic then ads on your site will have an even worse conversion rate than normal?

curiousObject•1h ago

if you have a lot of fake traffic then ads on your site will have an even worse conversion rate than normal?

Yes, if you only count purchase/sale conversions

Maybe no, if you also count clickthrough and view conversions, perhaps even lead conversions sometimes (because fake sign ups are possible).

But you’re right. Purchase conversions are one incentive

kijin•1h ago

Next thing you know, the bots will learn how to purchase products, and we'll have come full circle.

I'd love to have an agent that goes online whenever I'm running low on toilet paper or something, browses all the stores and clicks all the ads, and automatically orders the best deal it can find.

technothrasher•57m ago

I certainly wouldn't want to be caught in that arms race where sites continually attempt to find sneaky ways to convince my agent to buy things I don't actually want.

sanex•57m ago

By clicking on all the ads you're raising prices in the long run and just funneling money to google and meta, forcing the toilet paper companies to make scratchier tp to try and squeeze out more margin. My bhole would thank you for not doing this.

kijin•43m ago

Or maybe if everyone does it, market forces will suppress the cost per click and we'll be back to square zero.

sanex•27m ago

That's interesting. Drive ad value to zero or force them to start blocking crappy traffic. I think the issue is this could take us down the device attestation drm for the web path to discern legitimate traffic which is bad for the open web.

blakesterz•1h ago

I guess the answer is no, and there's a couple quotes in there near the end:

  One rep I had known for years finally admitted the truth off the record. "Dude, we know," he said. "Everyone knows. But if we filtered it all out properly, our revenue would drop 40% overnight, and investors would have a meltdown."

Galanwe•1h ago

For any real audit of website traffic (M&A, large advertising deals, etc), you typically don't rely on self reported statistics, but rather 3rd parties (e.g. SimilarWeb). These have actual spywares on top of Google analytics plug-ins to correlate real traffic from noise

Propelloni•1h ago

I find the article's topic interesting, but the writing style is just... no. It reads like a True Crime transcript or really bad marketing copy. Which makes a certain kind of sense, I guess.

sixthDot•1h ago

What is really mind blowing is that, if understood correctly, bots would be used to check the availability of a product, that sounds so a "hacky" method, like "seriously people are doing that in 2025".

viraptor•1h ago

Yeah, their list of recommendations could use another point: expose the public data in a simple, structured way.

I'm working right now on an inventory management system for a clinic which would really benefit from pulling the prices and availability from a very specialised online shop. I wish I could just get a large, fully cached status of all items in a json/CSV/whatever format. But they're extremely not interested, so I'm scraping the html from 50 separate categories instead. They'll get a few daily bot hits and neither of us will be happy about it.

If people are scraping data that you're not selling, they're not going to stop - just make it trivially accessible instead in a way that doesn't waste resources and destroy metrics.

nemomarx•1h ago

I wonder if LLM agents will know to go for apis and data or if they'll keep naively scraping in the future. A lot of traffic could come down to "find me x product online" chats eventually

viraptor•1h ago

https://llmstxt.org/ is there for that purpose.

bdcravens•1h ago

HTML is the only truly universal standard.

Closi•1h ago

The counterpoint is 'Why hand your competitors data on a silver plate'?

Sure you might be willing to build the bot to scrape it... but some other competitors won't go to this effort so it still means a bit of information asymmetry and stops some of your competitors poaching customers / employing various marketing tactics to exploit short term shortages or pricing charges etc.

viraptor•55m ago

I really don't believe we're in a situation where a company can exploit product availability and pricing data, is pushing enough volume to make it worth it, can process that information effectively, yet cannot hire someone on Fiverr to write a scraper in a few hours.

> 'Why hand your competitors data on a silver plate'?

To lessen the issue from the article and free up server resources for actual customers.

mschuster91•15m ago

At that point, why not join forces with other clinics, remove the middleman and purchase directly from vendors?

hamburgererror•1h ago

Can you share the script you made for this analysis?

squeedles•1h ago

Why is this in the least surprising? It's just the natural successor to what everyone used to do with the trade magazines thirty years ago. Back then you filled in a profile questionnaire to get a free subscription, so every basement hacker turned into the manager of a 500-person division with control of a $1m capital budget. The magazine didn't want to check because it would damage the demographic numbers that they pitched to advertisers. The advertisers knew that there was some liar's poker being played but everyone just rolled with it.

Esophagus4•1h ago

Interesting. You didn’t give specifics on what anti-bot measures sites implemented, so I’ll add:

Bot prevention measures can be good, but the more hoops you make your users jump though (CAPTCHA etc), the more legitimate users will drop off. Those have significant impacts on conversion rates.

I would think fixing this should involve the analytics and attribution side rather than adding friction to your e commerce flow.

Especially as bot tech continues to get better and more indistinguishable from real traffic.

baobun•1h ago

Would like to see the script. From reading it's impossible to tell if the methodology is sound. Would legitimate users with adblockers or disabling JS get counted is false positives, for example?

That said, 73% doesn't come as a surprise. If anything I expect it to be higher.

I guess this quote sums up the situation

> When I tried to bring this up with a few major ad platforms, the conversation always followed a predictable script. The sales reps were incredibly friendly until I mentioned click fraud or bot traffic. Then, the tone shifted instantly to corporate-speak: "Our AI detection is industry leading" and "We take ad fraud very seriously." It was a polite but firm wall, a clear signal to stop asking questions.

> One rep I had known for years finally admitted the truth off the record. "Dude, we know," he said. "Everyone knows. But if we filtered it all out properly, our revenue would drop 40% overnight, and investors would have a meltdown."

chrismorgan•1h ago

I’m puzzled by this: I thought it was well-understood, at least in the industry, that traffic numbers were at least mostly nonsense, and that ad click metrics especially were suuuper shady, typically more than half fraud; yet OP, in the business of “accurate ad spend analytics”, only just discovered this!?

It just doesn’t ring true. That aspect of the story isn’t novel at all, and someone in that line of work should surely have known all this, right?

Now the section on categorising different bot patterns, that’s more interesting, and I haven’t seen so much said about it.

nemomarx•1h ago

50k to 47 conversions seems like a noticeable increase in fakeness to me at least. that's going from more than half fraud to essentially only fraud with a rounding error amount of real users

lazide•1h ago

This is the ‘gold’ that agentic/generative AI really produces.

For most consumers, it’s entertainment, but for industrial use it’s great for this kind of fraud. And very difficult to detect at scale, since the only cost effective tools for this kind of analysis are also ML/AI, and hence can be fooled more predictably/trained against.

zdragnar•14m ago

The last startup I worked at invested a ton of money- much of their marketing team's time plus a good chunk of development time for a few months- trying to "fix the funnel" in terms of conversions from these so-called leads.

They had plenty of other problems, including an unworkable business plan, but maybe they would have had more time to pivot before selling out for pennies if they'd not been chasing their tail so much.

boringg•11m ago

This is well understood for well over a decade at this point -- the blogpost is a marketing effort by datacops.

nurettin•1h ago

> scraping 70 million retailer web pages every single day. This is a legitimate and massive source of automated traffic.

Why do they do this? For vital business intelligence. Major retailers like Amazon do not always notify vendors when they run out of stock. So, brands pay for data scraping services to monitor their own products. These "good bots" check inventory levels, see who is winning the "buy box," ensure product descriptions are correct, and track search result rankings. They even scrape from different locations and mobile device profiles to analyze what banner ads are being shown to different audiences.

_---------------_

Guilty as charged. You quickly learn to bypass bot detection measures and create a fully automated system to gather all this information just because amazon doesn't provide it in an accessible manner causing harm to businesses who need this intel and their own internet infra.

Dead_Lemon•1h ago

It makes the argument of the open internet being unable to function without advertising, quite hard to prop up. Especially when over 70% of traffic if just people gaming the system, to real users detriment.

awongh•1h ago

I don't really believe the main thesis of this article- it reads like much of the fake cliff-hanger pseudo-insight endemic to marketing and business influencers.

Mainly, it avoids the main point- 73% of your traffic is "faked" enough to look real.

Who are the players in that scenario that stand to benefit from your traffic being fake?

You pay for Google (search ads) and Facebook ads but the traffic is faked by them (unlikely)

You pay other publishing networks (maybe adsense?) and the website owners profit from sending fake traffic (maybe true? if the article were really trying to make a case for this, just name them?)

Or, you work inside a company and just want to make your department look good?

I'm not sure I know what the point of this article is besides a click bait title.

Just tell me exactly what the mechanism is for this fake traffic- don't hint at some kind of conspiracy.

ryandrake•1h ago

It would take a lot more investigative journalism to track down the bot networks and figure out the why and who, which I agree is the more interesting information. It’s not surprising that nobody seems to be willing to be quoted by name as a source. The whole industry seems fake and shady.

Despite the “LinkedIn influencer” writing style of the article, the results don’t seem that shocking or unexpected.

brazukadev•36m ago

> You pay for Google (search ads) and Facebook ads but the traffic is faked by them (unlikely)

Unlikely?

rubyfan•1h ago

Who is profiting from the ad fraud?

rightbyte•41m ago

Site owners and ad market places?

jt2190•1h ago

> This is the hidden bot economy.

I still have to hear a compelling argument about why I should use computers “by hand” and ignore these powerful tools. Price checking, comparison shopping, buy when released for sale… All of these things point me to using bots.

This feels a lot less like “fraud” and a lot more like “the world has moved on”. Maybe it’s time to route traffic that looks like bots to a bot-optimized shopping experience.

MASNeo•47m ago

So are you saying it needs a an agent.txt along the robots.txt?

jt2190•42m ago

I’m just the idea guy… I’ll let the market decide on the exact implementation. ;-) (That’s not a bad idea BTW.)

shiftingleft•1h ago

  After we implemented advanced bot traffic detection and filtering, their reported traffic plummeted by 71%. [...]
  But then the sales report came in. Their actual sales went up by 34%.
  Their real conversion rate optimization (CRO) efforts had been working all along, but the results were buried under an avalanche of fake clicks. They were not bad at marketing; they were just spending thousands of dollars advertising to robots programmed never to buy anything. Their marketing ROI went from "terrible" to "excellent" overnight.

I don't understand how detecting bot traffic would directly lead to less ad spend.

Can you just tell e.g. Google Ads that you don't want to pay for certain clicks?

Did they modify their targeting to try to avoid bots?

nemomarx•59m ago

I assume it's the filtering - detect the user is a bot, don't even load the ads, etc?

shiftingleft•56m ago

As I understand it they are placing ads on other sites and are paying for visits to their site.

weird-eye-issue•55m ago

You didn't think through this did you

How would you do that on Google or a third-party site?

V__•58m ago

I could imagine that blocking bot traffic, would improve their retargeting and make sure that the retargeting budget is spent on real people leading to an increase in conversion.

shiftingleft•52m ago

What's the API here for Google Ads? How does their site report to Google Ads whether that was a good/bad user? Is this done through conversion tracking? If so, why would you track anything but a completed purchase in the first place?

morkalork•45m ago

If you building look-alike or remarketing audiences, having any bot users in there could give the wrong signal to Facebook or other platforms.

>Can you just tell e.g. Google Ads that you don't want to pay for certain clicks?

tagalog•55m ago

We ran into this problem when running ads for an iOS App only to iOS traffic. Somehow 80% of our iOS only targeted traffic clickthrough was Android... Went to UGC and never looked back.

onionisafruit•45m ago

What’s ugc?

anticorporate•32m ago

ugc = user generated content

MASNeo•43m ago

Would be great if more data were made available by OP to peer review some of this. That said, making money with failure starts looking like a business model - highly unethical. Why make customers succeed when you loose money doing so.

emacdona•13m ago

I'm not sure... but... maybe in this one single instance, I'm rooting for the bots.

I mean, it's burning ad dollars and causing advertisers to rethink their strategy. Who knows, maybe that will eventually lead to the realization that web pages that are 20% content and 80% ads are just luring bots and not customers.

On the other hand, the money being burnt is going to Google, Meta, etc... and helping fund massive surveillance infrastructure. To be honest, I'd prefer it if it all just went to shareholders. Heh, maybe that'll be the sign that we've hit peak surveillance infrastructure: Google and Meta dividend payments go up :-)

But I have trouble sympathizing with someone who writes this:

> Mouse Movements: Did the cursor move in natural, human-like arcs, or did it snap between points?

> Scrolling Patterns: Was the scrolling speed variable, with pauses and upward scrolls, or was it a perfectly smooth, mechanical glide?

> Time Between Interactions: How long did a "user" wait between clicking a link, hovering over an image, or adding an item to the cart?

I read that as: "We're tracking every movement, every hesitation... so that we can feed it to our models and determine how best to keep you addicted".

I knew it was happening, and I know I'm editorializing there... but they are getting closer and closer to just coming out and saying it.

edit: Added newlines in quoted part.

vmaurin•12m ago

I did work in the ad tech industry for almost 15y and big corp like Google/FB scam their user:

- they don't allow double tracking, so you have to trust their numbers

- if you look at IP from their "clicks", you see often a FB/Google datacenter IP range

- and for most of the traffic they might send you, they did just clever algorithm and heavy profiling to stole your organic traffic. So they get this "amazing" performance by claiming people that would have bought on your site anyway

I have seen and been working in companies trying do to the impact metrics well, but these are outliers

- websites showing ads are annoying their user and get no benefit of it

- stores/brands/people that want to advert pays a bug chunk of money for nothing - only the middle men are getting benefits

ramesh31•7m ago

After a decade in adtech, I can confidently say that 73% of all traffic is fake.

Which AI Voice Agents to use in 2025?

Who Said Neural Networks Aren't Linear?

Barbarians at the Gate: How AI Is Upending Systems Research

F5 says hackers stole undisclosed BIG-IP flaws, source code

Zuban Beta Release: High-Performance Python Type Checker

Uber Losses

Customer Service Firm 5CA Denies Responsibility for Discord Data Breach

Improving User Interaction

Faroese, Croatian, Slovenian and Vietnamese might be removed from GUI

What AI Hype Misses About Real Software Engineering Work

Ask HN: What's working for founders raising without warm intros in 2025?

Data darkness in US spreads a global shadow

Building a Container from Scratch with Bash (No Docker, No Magic)

I have built an LLM API logger with rate limits and request scoping

AI startup Augment scraps 'unsustainable' pricing, users say new model 10x worse

Wheretowatch.stream:Stream availability by season with dubbed and subtitled ver

Electric Truck Charging Is Here, and Governments Want More

Show HN: GenAI Test Case Generator – Reduces QA time by 80% using GPT-4

Solution notes: stop repeating past mistakes

Ask HN: How to sanity check an ambitious autocoder for enterprise systems?

China Can't Win

Show HN: BrowserPod – In-browser Node.js, Vite, and Svelte with full networking

As Windows 10 signs off, ReactOS exploring long-awaited feature in WDDM support

M5 iPad Pro

Apple Vision Pro

Apple introduces the powerful new iPad Pro with the M5 chip

M5 MacBook Pro

Apple unveils new 14‑inch MacBook Pro powered by the M5 chip

Mac Source Ports – run old games on new Macs

Easy For The Masses

Which AI Voice Agents to use in 2025?

Who Said Neural Networks Aren't Linear?

Barbarians at the Gate: How AI Is Upending Systems Research

F5 says hackers stole undisclosed BIG-IP flaws, source code

Zuban Beta Release: High-Performance Python Type Checker

Uber Losses

Customer Service Firm 5CA Denies Responsibility for Discord Data Breach

Improving User Interaction

Faroese, Croatian, Slovenian and Vietnamese might be removed from GUI

What AI Hype Misses About Real Software Engineering Work

Ask HN: What's working for founders raising without warm intros in 2025?

Data darkness in US spreads a global shadow

Building a Container from Scratch with Bash (No Docker, No Magic)

I have built an LLM API logger with rate limits and request scoping

AI startup Augment scraps 'unsustainable' pricing, users say new model 10x worse

Wheretowatch.stream:Stream availability by season with dubbed and subtitled ver

Electric Truck Charging Is Here, and Governments Want More

Show HN: GenAI Test Case Generator – Reduces QA time by 80% using GPT-4

Solution notes: stop repeating past mistakes

Ask HN: How to sanity check an ambitious autocoder for enterprise systems?

China Can't Win

Show HN: BrowserPod – In-browser Node.js, Vite, and Svelte with full networking

As Windows 10 signs off, ReactOS exploring long-awaited feature in WDDM support

M5 iPad Pro

Apple Vision Pro

Apple introduces the powerful new iPad Pro with the M5 chip

M5 MacBook Pro

Apple unveils new 14‑inch MacBook Pro powered by the M5 chip

Mac Source Ports – run old games on new Macs

Easy For The Masses

I analyzed 200 e-commerce sites and found 73% of their traffic is fake

Comments