Personally I'm surprised they didn't have a Samsung option.
They have an interest in securing their devices so they can sell proxy service themselves.
But, my main point, is that the whole business is "on the up and up" vs some dark botnet.
> While operators of residential proxies often extol the privacy and freedom of expression benefits of residential proxies, Google Threat Intelligence Group’s (GTIG) research shows that these proxies are overwhelmingly misused by bad actors
Honeygain is a platform where people sell their residential internet connection and bandwidth to these companies for money.
For comparison Honeygain pays someone 10 cents per GB, and Oxylabs sells it for $8/GB.
Saying you don't know something in the comments of an article that explains that thing is a bold strategy
[1] Using the website mentioned by user Rasbora https://news.ycombinator.com/item?id=46837806
This is very commonly true but sadly not 100%. I am suffering from a shared /64 on which a VPS is, and where other folks have sent out spam - so no more SMTP for me.
As someone who wants the internet to maintain as much anarchy as possible I think it would be nice to see a large ISP that actively rotated its customer IPv6 assignments on a tight schedule.
I've had enough of companies saying "you're connecting from an AWS IP address, therefore you aren't allowed in, or must buy enterprise licensing". Reddit is an example which totally blocks all data to non-residential IP's.
I want exactly the same content visible no matter who you are or where you are connecting from, and a robust network of residential proxies is a stepping stone to achieving that.
(What else?)
Devices on Apple’s Find My aren’t broadcasting anything like packets that get forwarded to a destination of their choosing. I would think that would be a necessity to call it “proxying”.
They’re just broadcasting basic information about themselves into the void. The phones report back what they’ve picked up.
That doesn’t fit the definition to me.
I absolutely don’t mind the fact that my phone is doing that. The amount of data is ridiculously minuscule. And it’s sort of a tit for tat thing. Yeah my phone does it, but so does theirs. So just like I may be helping you locate your AirTag, you would be helping me locate mine. Or any other device I own that shows up on Find My.
It’s a very close to a classic public good, with the only restriction being that you own a relevant device.
Protocol insists the data only goes back to owner device or Apple server.
What will you be proxying? Nobody knows! I haven't had the police at my house yet.
Seems a great way to say "fuck you" to companies that block IP addresses.
You may see a few more CAPTCHAs. If you have a dynamic IP address, not many.
Doesn't the ISP detect them?
and why would they
Or residential proxies get so widespread that almost every house has a proxy in, and it becomes the new way the internet works - "for privacy, your data has been routed through someone else's connection at random".
Is this a re-invention of tor, maybe I2P?
in a way, yes - the weakness of tor is realistically the lack of widespreadness. Tor traffic is identifiable and blockable due to the relatively rare number of exit nodes (which also makes it dangerous to run exit nodes, as you become "liable").
Engraining the ideas of tor into regular users' internet usage is what would prevent the internet from being controlled and blockable by any actor (except perhaps draconian gov't over reach, which while can happen, is harder in the west).
> While many residential proxy providers state that they source their IP addresses ethically, our analysis shows these claims are often incorrect or overstated. Many of the malicious applications we analyzed in our investigation did not disclose that they enrolled devices into the IPIDEA proxy network. Researchers have previously found uncertified and off-brand Android Open Source Project devices, such as television set top boxes, with hidden residential proxy payloads.
The reason those IP addresses get blocked is not because of "who" is connecting, but "what"
Traffic from datacenter address ranges to sites like Reddit is almost entirely bots and scrapers. They can put a tremendous load on your site because many will try to run their queries as fast as they can with as many IPs as they can get.
Blocking these IP addresses catches a few false positives, but it's an easy step to make botting and scraping a little more expensive. Residential proxies aren't all that expensive, but now there's a little line item bill that comes with their request volume that makes them think twice.
> We need more residential proxies, not less
Great, you can always volunteer your home IP address as a start. There are services that will pay you a nominal amount for it, even.
Because if the deterrent here is a line item so small it shows up as 'miscellaneous vibes' on a balance sheet, that's not a barrier. That's a tip jar.
I run a honeypot and the amount of bot traffic coming from AWS is insane. It's like 80% before filtering, and it's 100% illegitimate.
Based on what?
I think perhaps you merely meant to say that more than 99% of it is illegitimate?
I find it funny that companies like Reddit, who make their money entirely from content produced by users for free (which is also often sourced from other parts of the internet without permission), are so against their site being scraped that they have to objectively ruin the site for everyone using it. See the API changes and killing off of third party apps.
Obviously, it's mostly for advertising purposes, but they love to talk about the load scraping puts on their site, even suing AI companies and SerpApi for it. If it's truly that bad, just offer a free API for the scrapers to use - or even an API that works out just slightly cheaper than using proxies...
My ideal internet would look something like that, all content free and accessible to everyone.
Third party app users were a very small but vocal minority. The API changes didn't drop their traffic at all. In fact, it's only gone up since then.
The datacenter IP address blocks aren't just for scrapers, it's an anti-bot measure across the board. I don't spend much time on Reddit but even the few subreddits I visited were starting to become infiltrated by obvious bot accounts doing weird karma farming operations.
Even HN routinely gets AI posting bots. It's a common technique to generate upvote rings - Make the accounts post comments so they look real enough, have the bots randomly upvote things to hide activity, and then when someone buys upvotes you have a selection of the puppet accounts upvote the targeted story. Having a lot of IP addresses and generating fake activity is key to making this work, so there's a lot of incentive to do it.
Really? Because I live in the UK and I've never been asked for my ID for anything.
They just stole this and get on their high horse to tell people how to use internet? You can eff right off Google.
Proxies actually help with that by facilitating mass account registration and scraping of the content without wasting a human's time "engaging" with ads.
There was also a botnet, Kimwolf, that apparently leveraged an exploit to use the residential proxy service, so it may be related to Ipidea not shutting them down.
the answer is stop all the bad actors, not “well jimmy does it!”
Some sites don't want you scraping, but it's their content, their rules. We don't really care, but we have to due to the number and quality of the bots we're seeing. This is in my mind a 100% self-imposed problem from the scrapers.
Yes, proxies are good. Ones which you pay for and which are running legitimately, with the knowledge (and compensation) of those who run them.
Malware in random apps running on your device without your knowledge is bad.
It's just nasty stuff. Intent matters, and if you're selling a service that's used only by the bad guys, you're a bad guy too. This is not some dual-use, maybe-we-should-accept-the-risks deal that you have with Tor.
Excluding known "good" crawlers, well over 99% of the traffic trying to hit the site has been attempting to maliciously scrape. Most of this traffic looks genuine, but has random genuine-looking user agents and comes from random residential proxies in various countries, usually the US.
For the traffic that does make it all the way to a browser challenge, the success rate is a measly 0.48%. Put another way, over 50% of traffic is already blocked by that point, and of the under 50% that makes it to a browser challenge, more than 99.5% fails that challenge.
It's been virtually no disruption to users either, since I configured successful challenges to be remembered for a long period of time. The legitimate traffic is a gentle trickle, while the WAF is holding back garbage traffic that's orders of magnitude above and beyond normal levels. The scale of it is truly insane.
It directly affects Google and you, I don’t see why they should not do this.
I don't see any spam in Kagi, so clearly there is a way to detect and filter it out. Google is simply not doing so because it would cut into their profits.
They can probably get away with a lot of stupid rules that would backfire if anybody tried to cater to them specifically.
But let's play devil's advocate and say you are right and spammers are successfully outsmarting Google - well, Kagi does use Google results via SerpAPI by their own admission, meaning they too should have those spam results. Yet they somehow manage to filter them out with a fraction of the resources available to Google itself with no negative impact on search quality.
And ones that have all the indicators of compromise of Russia, Iran, DPRK, PRC, etc
And when Google say
"IPIDEA’s proxy infrastructure is a little-known component of the digital ecosystem leveraged by a wide array of bad actors."
What they really mean is " ... leveraged by actors indiscriminately scraping the web and ignoring copyright - that are not us."
I can't help but feel this is just Google trying to pull the ladder up behind then and make it more difficult for other companies to collect training data.
Appeal to authority by way of invoking the megacorp-branded "threat intelligence" capability (targeted PR exercise).
I can very easily see this as being Google's reasoning for these actions, but let's not pretend that clandestine residential proxies aren't used for nefarious things. The vast majority of social media networks will ban - or more generally and insiously - shadow ban accounts/IPs that use known proxy IPs. This means that they are gating access to their platforms behind residential IPs (on top of their other various blackboxes and heuristics like fingerprinting). Operators of bot networks thus rely on residential proxy services to engage in their work, which ranges from mundane things like engagement farming to outright dangerous things like political astroturfing, sentiment manipulation, and propaganda dissemination.
LLMs and generative image and video models have made the creation of biased and convincing content trivial and cheap, if not free. The days of "troll farms" is over, and now the greatest expense for a bad actor wishing to influence the world with fake engagement and biased opinions is their access to platforms, which means accounts and internet connections that aren't blacklisted or shadow banned. Account maturity and reputation farming is also feeling a massive boon due to these tools, but as an independent market it also similarly requires internet connections that aren't blacklisted or shadow banned. Residential proxies are the bottleneck for the vast majority of bad actors.
Social media will ban proxy IPs, yet gleefully force you to provide your ID if you happen to connect from the wrong patch of land. I find it difficult not to support any and all attempts to bypass such measures.
The fact is that there's now a perfectly legitimate use for residential proxies, and the demand is just going to keep growing as more websites decide to "protect their content", and more governments decide to pass tyrannical laws that force people to mask their IPs. And with demand, comes supply, so don't expect them to go away any time soon.
This really just sounds like a rehash of the argument against encryption. "Bad people use it, so it should go away" - never mind that there are completely legitimate uses for it. Never mind that using a residential proxy might be the only way to get any privacy at all in a future where everyone blocks VPNs and Tor, a future where you may not even be able to post online without an ID depending you where you live, a future which we're swiftly approaching.
It's already here, in fact. Imgur blocks UK users, but it also blocks VPNs and Tor. The only way somebody living in the UK can access Imgur is through a residential proxy.
And very little of value was lost.
> This really just sounds like a rehash of the argument against encryption. "Bad people use it, so it should go away" - never mind that there are completely legitimate uses for it.
Except that almost everything that uses encryption has some legitimate use. There are pretty much no legitimate uses for residential proxies, and their use in flooding the Internet with crap greatly outweighs that.
If I plumbed a 30cm sewage line straight into your living room would you be happy with it? Okay, well, tell you what, let's make it totally legit - I'll drop a tasty ripe strawberry into the stream of effluent every so often, how about that?
Maybe. But until I dropped all traffic from pretty much every mobile network provider in Russia and Israel, I'd get up every morning to a couple of thousand new users of whom a couple of hundred had consistently within a few hundred milliseconds created an account, clicked on the activation link, and then posted a bunch of messages in every forum category spreading hate speech.
Sounds like they’re targeting networks even if the users are ok participating in, precisely what you’re saying is ok.
As for malware enrolling people into the network, it depends if the operator is doing it or if the malware is 3rd parties trying to get a portion of the cash flow. In the latter case the network would be the victim that’s double victimized by Google also attacking them.
These residential proxies are pretty much universally shady. I doubt most of the users understand what they are consenting to.
been running nodes since 2017 on two providers and zero issues
?
> These SDKs, which are offered to developers across multiple mobile and desktop platforms.
> other actors then surreptitiously enroll user devices into the IPIDEA network using these frameworks.
I’m not saying Google did the wrong thing, but it is one private entity essentially handing out a death sentence on its own. The only mitigating thing is that a) technical disruptions were either on their own infra b) legal judgements they then enforced with cooperation from others like Cloudflare. But it’s not clear what the legal proceedings were actually like
The problem is, it is by default unethical to have residential users be exit nodes for VPNs - unless these users are lawyers or technical experts.
No matter what you do as a "residential proxy" company - you cannot prevent your service being used by CSAM peddlers, and thus you cannot prevent that your exit nodes aren't the ones whose IP addresses show up when the FBI comes knocking.
Source: went through that process, ended up going a different route. The rep was refreshingly transparent about where they get the data, why the have the kyc process (aside from regulatory compliance).
Ended up going with a different provider who has been cheaper and very reliable, so no complaints.
It’s not like I’m using some bigco email address or given them any other reason to skip KYC either.
It might just be because my account is very old?
I don’t use Luminati for anything illegal though, so it’s possible they just have some super amazing abuse detection algorithms that know this.
When the Chinese do this? Very bad.
I do think the disparity in attention is fascinating. These new Chinese players have been getting nonstop press while everyone ignores the established giant.
What they did not comment directly on, is how many apps / games they might have actually removed from the Playstore with the removal of the SDKs, which would be the actual interesting data.
Hard to imagine any serious anti-abuse efforts by Luminati if they don't monitor what their users are doing, but this is probably a deliberate effort to avoid potential liability arising from knowing what their users are doing.
If you crawl at 1Hz per crawled IP, no reasonable server would suffer from this. It's the few bad apples (impatient people who don't rate limit) who ruin the internet for both users and hosters alike. And then there's Google.
Rules For Thee but Not for Me
because google (and the couple of other search engines) provide enough value that offset the crawler's resource consumption.
implying “robots.txt explicitly says i can’t scrape their site, well i want that data, so im directing my bot to take it anyway.”
Do you think this is such a horrible thing to scrape? I can't do it manually since there are few hundred locations. I could write some python script which uses playwrite to scrape things using my desktop browser in order to avoid CloudFlare. Or, which I am much more familiar with, I could write a python script that uses BeautifulSoup to extract all the relevant locations once for me. I would have been perfectly happy fetching 1 page/sec or even 1 page/2 seconds and would still be done within 20 minutes if only there was no anti-scraping protection.
Scraping is a perfectly legal activity, after all. Except thanks to overly-eager scraping bots and clueless/malicious people who run them there's very little chance for anyone trying to compete with Google or even do small scale scraping to make their life and life of local art enthusiasts easier. Google owns search. Google IS search and no competition is allowed, it seems.
Why is hammering the everloving fuck out of their website okay?
They made the data available on the website already, there's no reason to contact them when you can just load it from their website.
1 Hz is 86400 hits per day, or 600k hits per week. That's just one crawler.
Just checked my access log... 958k hits in a week from 622k unique addresses.
95% is fetching random links from u-boot repository that I host, which is completely random. I blocked all of the GCP/AWS/Alibaba and of course Azure cloud IP ranges.
It's almost all now just comming of a "residential" and "mobile" IP address space from completely random places all around the world. I'm pretty sure my u-boot fork is not that popular. :-D
Every request is a new IP address, and available IP space of the crawler(s) is millions of addresses.
I don't host a popular repo. I host a bot attraction.
I used Anubis and a cookie redirect to cut the load on my Forgejo server by around 3 orders of magnitude: https://honeypot.net/2025/12/22/i-read-yann-espositos-blog.h...
I guess the bots are all spoofing consumer browser UAs and just the slightest friction outside of well-known tooling will deter them completely.
A whitelist would be needed for sites where getting all the pages make sense. And probably in addition to the 1Hz, an additional limit of 1k/day would be needed.
I can see now why Google has not much solid competition (Yandex/Baidu arguably don't compete due to network segmentation).
Scraping reliably is hard, and the chance of kicking Google off their throne may be even further reduced due to AI crawler abuse.
PS 958k hits is a lot! Even if your pages were a tiny 7.8k each (HN front page minus assets), that would be about 7G of data (about 4.6 Bee Movies in 720p h256).
The residential proxies are not needed, if you behave. My take is that you want to scrape stuff that site owners do not want to give you and you don't want to be told no or perhaps pay a license. That is the only case where I can see you needing a residential proxies.
I'm starting to think that somee users in hackernews do not 'behave' or at least they think they do not 'behave' and provide an alibi for those that do not 'behave'.
That the hacker in hackernews does not attract just hackers as in 'hacking together features' but also hackers as in 'illegitimately gaining access to servers/data'
As far as I can tell, as a hacker that hacks features together, resi proxies are something the enemy uses. Whenever I boot up a server and get 1000 log in requests per second and requests for commonly exploited files from russian and chinese IPs, those come from resi IPs no doubt. There's 2 sides to this match, no more.
That said, I support Google working to shut these networks down, since they are almost universally bad.
It’s just a shame that there’s no where to go for legitimate crawling activities.
Think about why that might be. I'm sorry, if you legitimately need to crawl the net, and do so from a cloud provide, your industry screwed you over with bad behaviour. Go get hosting with a company that cares about who their customers are, you're hanging out with a bad crowd.
This differs obviously, but having an ASN in our case means that we can deal you, contact you and assume that you're better than random bot number 817.
There are lots of healthy / productive businesses in the cloud and lots of scumbags, just like any enterprise.
I still have no idea about your point, by the way.
Hiding behind a residential proxy and using random user agents? Gross. Learn what consent is.
You're thinking about the case of big AI companies crawling your blog. I'm talking about a small startup trying to do traditional indexing and needing to run from residential proxy to make it work.
I actually do let quite a few known, "good" scrapers scrape my stuff. They identify themselves, they make it clear what they do, and they respect conventions like robots.txt.
These residential proxies have been abused by scrapers that use random legit-looking user agents and absolutely hammer websites. What is it with these scrapers just not understanding consent? It's gross.
if someone is abusing my site, and i block them in an attempt to stop that abuse, do we think that they are correct to tell me it doesn’t matter what i think and to use any methods they want to keep abusing it?
that seems wrong to me.
The largest companies in this space that do similar this (oxylabs, brighdata,etc) have similar tactics but are based in a different location.
Sounds like "malicious activity" == "scraping activities that don't come from Google"
Only looking at the:
- a8d3b9e1f5c7024d6e0b7a2c9f1d83e5.com
- af4760df2c08896a9638e26e7dd20aae.com
- cfe47df26c8eaf0a7c136b50c703e173.com
Looks like a standard MD5 hash domain pattern of which currently there are:
user@host:/data/domains/2026/01/30$ zgrep -iE '^[a-f0-9]{32}\.com$' com.txt_domains.gz | wc -l
3005
If you look at some of the others (not listed in Google's IOC), they tend to have a pattern with their SSL certs e.g.:- 0e6f931862947ad58bf3d1a0c5a6f91f.com
X509v3 Subject Alternative Name:
DNS:0e6f931862947ad58bf3d1a0c5a6f91f.com, DNS:effc538138d9342c547c5df42b03d81e.com, DNS:gulfclouds.site, DNS:xinchaobccgba.net
- 17e4435ad10c15887d1faea64ee7eac4.com X509v3 Subject Alternative Name:
DNS:0dcbdf154c39288c91feb076795715e1.com, DNS:0e8843e8f10f20eeef59f0076e4feb83.shop, DNS:1014a1fb60e1b91404682e572ede6b4f.com, DNS:178281a79266d2faa3e578f23c8a361e.com, DNS:17e4435ad10c15887d1faea64ee7eac4.com, DNS:19f75b2642320e0606f5e38ce9fbcf17.com, DNS:1vxe.com, DNS:292893d0b31941e1c0d8eb01235be4eb.com, DNS:2b1e642f3a60130d1b2cf244891bef0d.info, DNS:354542342b7d2ddb66c97240d0c770dc.com, DNS:37d993ba8c9284bedad2a3177dfc44a6.info, DNS:3857036aaeedf670bbcca926945b50dd.com, DNS:3961f3fa3a6bacc5c4f28e81c60f4169.com, DNS:3eb4b3a3f8722b60d6ba2de7dd5f2523.org, DNS:42a17c71c0d6f2a6d7e135f8e869ab3f.com, DNS:4edd3793da3080640431430a4da57a86.org, DNS:4f5667d51451a2060067a97bcddf077f.info, DNS:5006cc38aff1ebc7d1232037fd592c60.net, DNS:54c35ec930f5b52fd9505778bb9c3f00.com, DNS:60255ec5427c2ba9a80b9c7648dd62e9.com, DNS:638d0e352728a04bb56ca102e54b8c9b.xyz, DNS:69234f9b18c0b4d572dc553dbfdb8f52.com, DNS:6934addf679d79a79f0bfc2ff090b104.com, DNS:694b64c9b41c17a229d92156d14a4ffd4.com, DNS:6eba8c4def89561e1cee02bb3c9b373d.info, DNS:7050f8c6563ff47465932e3838dc06fd.com, DNS:72ad0de0a556f763e0629c64c694df4c.com, DNS:86f7020358afaf71baeee5782b6264e4.xyz, DNS:88f2f20d26dcabeafd2f9d24e7ea4e50.com, DNS:911f4bf053ee3dadae1ca6bfdf40a817.com
would there be any reason any of these would be legitimate?You can check if your network is infected here: https://layer3intel.com/is-my-network-a-residential-proxy
Note that even after the disruption, I'm still able to route millions of requests/day through IP IDEA's network
xyzzy_plugh•1w ago
Nice to see Google Play Protect actually serving a purpose for once.
trollbridge•1w ago
Only Google is allowed to scrape the web.
a456463•1w ago
viraptor•1w ago
idiotsecant•1w ago
viraptor•1w ago
misir•1w ago
Proxies in comparison can allow new players to have some playing chance. That said I doubt any legitimate & ethical business would use proxies.
vachina•1w ago
Nextgrid•1w ago
1vuio0pswjnm7•1w ago
If I'm not mistaken, the plaintiffs in the US v Google antitrust litigation in the DC Circuit tried to argue that website operators are biased toward allowing Google to crawl and against allowing other search engines to do the same
The Court rejected this argument because the plaintiffs did not present any evidence to support it
For someone who does not follow the web's history, how would one produce direct evidence that the bias exists
SkiFire13•6d ago
Take a bunch of websites, fetch their robots.txt file and check how many allow GoogleBot but not others?
1vuio0pswjnm7•6d ago
miki123211•1w ago
This does nothing against your ability to scrape the web the Google way, AKA from your own assigned IP range, obeying robots.txt, and with an user agent that explicitly says what you're doing and gives website owners a way to opt out.
What Google doesn't want (and I don't think that's a bad thing) is competitors scraping the web in bad faith, without disclosing what they're doing to site owners and without giving them the ability to opt out.
If Google doesn't stop these proxies, unscrupulous parties will have a competitive advantage over Google, it's that simple. Then Google will have to decide between just giving up (unlikely) or becoming unscrupulous themselves.
ryanjshaw•6d ago
I thought that Google has access to significant portions of the internet that non-Google bots won’t have access to?
morkalork•6d ago
direwolf20•1w ago
tgsovlerkhgsel•1w ago
AFAIK it also left SmartTube (an alternative YouTube client) alone until the developer got pwned and the app trojanized with this kind of SDK, and the clean versions are AFAIK again being left alone. No guarantee that it won't change in the future, of course, but so far they seem to not be abusing it.
direwolf20•1w ago
ThePowerOfFuet•6d ago
direwolf20•6d ago
tgsovlerkhgsel•6d ago
As for "intrusive advertising is malicious", see the second part of the first sentence.
direwolf20•4d ago
Proxying traffic is not malware, since it doesn't affect me in any way.