The Web Is Broken – Botnet Part 2

https://jan.wildeboer.net/2025/04/Web-is-Broken-Botnet-Part-2/

411•todsacerdoti•6mo ago

Comments

api•6mo ago

This is nasty in other ways too. What happens when someone uses these B2P residential proxies to commit crimes that get traced back to you?

Anything incorporating anything like this is malware.

reconnecting•6mo ago

Many years ago cybercriminals used to hack computers to use them as residential proxies, now they purchase them online as a service.

In most cases they are used for conducting real financial crimes, but the police investigators are also aware that there is a very low chance that sophisticated fraud is committed directly from a residential IP address.

kastden•6mo ago

Are there any lists with known c&c servers for these services that can be added to Pihole/etc?

udev4096•6mo ago

You can use one of the list from here: https://github.com/hagezi/dns-blocklists

Liftyee•6mo ago

I don't know if I should be surprised about what's described in this article, given the current state of the world. Certainly I didn't know about it before, and I agree with the article's conclusion.

Personally, I think the "network sharing" software bundled with apps should fall into the category of potentially unwanted applications along with adware and spyware. All of the above "tag along" with something the user DID want to install, and quietly misuse the user's resources. Proxies like this definitely have an impact for metered/slow connections - I'm tempted to start Wireshark'ing my devices now to look for suspicious activity.

There should be a public repository of apps known to have these shady behaviours. Having done some light web scraping for archival/automation before, it's a pity that it'll become collateral damage in the anti-AI-botfarm fight.

zzo38computer•6mo ago

I agree, this should be called spyware, and malware. There are many other kind of software that also should, but netcat and ncat (probably) aren't malware.

akoboldfrying•6mo ago

I agree, but the harm done to the users is only one part of the total harm. I think it's quite plausible that many users wouldn't mind some small amount of their bandwidth being used, if it meant being able to use a handy browser extension that they would otherwise have to pay actual dollars for -- but the harm done to those running the servers remains.

arewethereyeta•6mo ago

I have some success in catching most of them at https://visitorquery.com

lq9AJ8yrfs•6mo ago

I went to your website.

Is the premise that users should not be allowed to use vpns in order to participate in ecommerce?

arewethereyeta•6mo ago

Nobody said that, it's your choice to take whatever action fits your scenario. I have clients where VPNs are blocked yes, it depends on the industry, fraud rate, chargeback rates etc.

ivas•6mo ago

Checked my connection via VPN by Google/Cloudflare WARP: "Proxy/VPN not detected"

arewethereyeta•6mo ago

Could be, I don't claim 100% success rate. I'll have a look at one of those and see why I missed it. Thank you for letting me know.

nickphx•6mo ago

measuring latency between different endpoints? I see the webrtc turn relay request..

karmanGO•6mo ago

Has anyone tried to compile a list of software that uses these libraries? It would be great to know what apps to avoid

arewethereyeta•6mo ago

No but here's the thing. Being in the industry for many years I know they are required to mention it in the TOS when using the SDKs. A crawler pulling app TOSs and parsing them could be a thing. List or not, it won't be too useful outside this tech community.

mzajc•6mo ago

In the case of Android, εxodus has one[1], though I couldn't find the malware library listed in TFA. Aurora Store[2], a FOSS Google Play Store client, also integrates it.

[1] https://reports.exodus-privacy.eu.org/en/trackers/ [2] https://f-droid.org/packages/com.aurora.store/

takluyver•6mo ago

That seems to be looking at tracking and data collection libraries, though, for things like advertising and crash reporting. I don't see any mention of the kind of 'network sharing' libraries that this article is about. Have I missed it?

lelanthran•6mo ago

> Has anyone tried to compile a list of software that uses these libraries? It would be great to know what apps to avoid

I wouldn't mind reading a comprehensive report on SOTA with regard to bot-blocking.

Sure, there's Anubis (although someone elsethread called it a half-measure, and I'd like to know why), there's captcha's, there's relying on a monopoly (cloudflare, etc) who probably also wants to run their own bots at some point, but what else is there?

il-b•6mo ago

A good portion of free VPN apps sell their traffic. This was the thing even before the AI bot explosion.

amiga-workbench•6mo ago

What is the point of app stores holding up releases for review if they don't even catch obvious malware like this?

SoftTalker•6mo ago

Money

_Algernon_•6mo ago

They pretend to do a review to justify their 30% cartel tax.

klabb3•6mo ago

Oh no, they review thoroughly, to make sure you don’t try to avoid the tax.

politelemon•6mo ago

Their marketing tells you it's for protection. What they fail to omit is it's for their revenue protection - observe that as long as you do not threaten their revenue models, or the revenue models of their partners, you are allowed through. It has never been about the users or developers.

charcircuit•6mo ago

The definition of malware is fuzzy.

wyck•6mo ago

This isn't obvious, 99% of apps make multiple calls to multiple services, and these SDK's are embedded into the app. How can you tell whats legit outbound/inbound? Doing a fingerprint search for the worst culprits might help catch some, but it would likely be a game of cat and mouse.

nottorp•6mo ago

> How can you tell whats legit outbound/inbound?

If the app isn't a web browser, none are legit?

wyck•6mo ago

99.9% of app on app store connect to the network for a multitude of reason, do you really think only browsers connect to the internet? Do you not have an app on your phone?

vlan121•6mo ago

when the shit hits the fan, this seems like the product.

ChrisMarshallNY•6mo ago

> So if you as an app developer include such a 3rd party SDK in your app to make some money — you are part of the problem and I think you should be held responsible for delivering malware to your users, making them botnet members.

I suspect that this goes for many different SDKs. Personally, I am really, really sick of hearing "That's a solved problem!", whenever I mention that I tend to "roll my own," as opposed to including some dependency, recommended by some jargon-addled dependency addict.

Bad actors love the dependency addiction of modern developers, and have learned to set some pretty clever traps.

duskwuff•6mo ago

That may be true but I think you're missing the point here.

The "network sharing" behavior in these SDKs is the sole purpose of the SDK. It isn't being included as a surprise along with some other desirable behavior. What needs to stop is developers including these SDKs as a secondary revenue source in free or ad-supported apps.

ChrisMarshallNY•6mo ago

> I think you're missing the point here

Doubt it. This is just one -of many- carrots that are used to entice developers to include dodgy software into their apps.

The problem is a lot bigger than these libraries. It's an endemic cultural issue. Much more difficult to quantify or fix.

sixtyj•6mo ago

Malware, botnets… it is very similar. And people including developers are - in 80 per cent - eagier to make money, because… Is greed good? No, it isn’t. It is a plague.

II2II•6mo ago

You're a developer who devoted time to develop a piece of software. You discover that you are not generating any income from it: few people can even find it in the sea of similar apps, few of those are willing to pay for it, and those who are willing to pay for it are not willing to pay much. To make matters worse, you're going to lose a cut of what is paid to the middlemen who facilitate the transaction.

Is that greed?

I can find many reasons to be critical of that developer, things like creating a product for a market segment that is saturated, and likely doing so because it is low hanging fruit (both conceptually and in terms of complexity). I can be critical of their moral judgement for how they decided to generate income from their poor business judgment. But I don't thinks it's right to automatically label them as greedy. They may be greedy, but they may also be trying to generate income from their work.

andelink•6mo ago

> Is that greed?

Umm, yes? You are not owed anything in this life, certainly not income for your choice to spend your time on building a software product no one asked for. Not making money on it is a perfectly fine outcome. If you desperately need guaranteed money, don't build an app expecting it to sell; get a job.

klabb3•6mo ago

> If you desperately need guaranteed money, don't build an app expecting it to sell; get a job.

Technically true but a bit of perspective might help. The consumer market is distorted by free (as in beer) apps that does a bunch of shitty things that should in many cases be illegal or require much more informed consent than today, like tracking everything they can. Then you have VC funded ”free” as well, where the end game is to raise prices slowly to boil the frog. Then you have loss leaders from megacorps, and a general anti-competitive business culture.

Plus, this is not just in the Wild West shady places, like the old piratebay ads. The top result for ”timer” on the App Store (for me) is indeed a timer app, but with IAP of $800/y subscription… facilitated by Apple Inc, who gets 15-30% of the bounty.

Look, the point is it’s almost impossible to break into consumer markets because everyone else is a predator. It’s a race to the bottom, ripping off clueless customers. Everyone would benefit from a fairer market. Especially honest developers.

what•6mo ago

>$800/year IAP

That’s got to be money laundering or something else illicit? No one is actually paying that for a timer app?

klabb3•6mo ago

No I think it’s designed to catch misclicks and children operating the phone and such, sold as $17/week possibly masquerading as one-time payment. They pay for App Store ads for it too.

econ•6mo ago

I prefer to focus on the technical shortcomings.

We could have people ask for software in a more convenient way.

Not making money could be an indication the software isn't useful, but what if it is? What can the collective do in that zone?

I imagine one could ask and pay for unwritten software then get a refund if it doesn't materialize before your deadline.

Why is discovery (of many creation) willingly handed over to a hand full of mega corps?? They seem to think I want to watch and read about Trump and Elon every day.

Promoting something because it is good is a great example of a good thing that shouldn't pay.

hliyan•6mo ago

There was an earlier discussion on HN about whether advertising should be more heavily regulated (or even banned outright). I'm starting to wonder whether most of the problems on the Web are negative side effects of the incentives created by ads (including all botnets, except those that enable ransomeware and espionage). Even the current worldwide dopamine addition is driven by apps and content created for engagement, whose entire purpose is ad revenue.

rsedgwick•6mo ago

"Bad actors love the dependency addiction of modern developers"

Brings a new meaning to dependency injection.

rapind•6mo ago

I mean, as far as patterns go, dependency injection is also quite bad.

rjbwork•6mo ago

Elaborate on this please. It seems a great boon in having pushed the OO world towards more functional principles, but I'm willing to hear dissent.

layer8•6mo ago

How is dependency injection more functional?

My personal beef is that most of the time it acts like hidden global dependencies, and the configuration of those dependencies, along with their lifetimes, becomes harder to understand by not being traceable in the source code.

kortilla•6mo ago

Because you’re passing functions to call.

layer8•6mo ago

??? What functions?

To me it‘s rather anti-functional. Normally, when you instantiate a class, the resulting object’s behavior only depends on the constructor arguments you pass it (= the behavior is purely a function of the arguments). With dependency injection, the object’s behavior may depend on some hidden configuration, and not even inspecting the class’ source code will be able to tell you the source of that bevavior, because there’s only an @Inject annotation without any further information.

Conversely, when you modify the configuration of which implementation gets injected for which interface type, you potentially modify the behavior of many places in the code (including, potentially, the behavior of dependencies your project may have), without having passed that code any arguments to that effect. A function executing that code suddenly behaves differently, without any indication of that difference at the call site, or traceable from the call site. That’s the opposite of the functional paradigm.

squeaky-clean•6mo ago

> because there’s only an @Inject annotation without any further information

It sounds like you have a gripe with a particular DI framework and not the idea of Dependency Injection. Because

> Normally, when you instantiate a class, the resulting object’s behavior only depends on the constructor arguments you pass it (= the behavior is purely a function of the arguments)

With Dependency Injection this is generally still true, even more so than normal because you're making the constructor's dependencies explicit in the arguments. If you have a class CriticalErrorLogger(), you can't directly tell where it logs to, is it using a flat file or stdout or a network logger? If you instead have a class CriticalErrorLogger(logger *io.writer), then when you create it you know exactly what it's using to log because you had to instantiate it and pass it in.

Or like Kortilla said, instead of passing in a class or struct you can pass in a function, so using the same example, something like CriticalErrorLogger(fn write)

layer8•6mo ago

I don't quite understand your example, but I don't think the particulars make much of a difference. We can go with the most general description: With dependency injection, you define points in your code where dependencies are injected. The injection point is usually a variable (this includes the case of constructor parameters), whose value (the dependency) will be set by the dependency injection framework. The behavior of the code that reads the variable and hence the injected value will then depend on the specific value that was injected.

My issue with that is this: From the point of view of the code accessing the injected value (and from the point of view of that code's callers), the value appears like out of thin air. There is no way to trace back from that code where the value came from. Similarly, when defining which value will be injected, it can be difficult to trace all the places where it will be injected.

In addition, there are often lifetime issues involved, when the injected value is itself a stateful object, or may indirectly depend on mutable, cached, or lazy-initialized, possibly external state. The time when the value's internal state is initialized or modified, or whether or not it is shared between separate injection points, is something that can't be deduced from the source code containing the injection points, but is often relevant for behavior, error handling, and general reasoning about the code.

All of this makes it more difficult to reason about the injected values, and about the code whose behavior will depend on those values, from looking at the source code.

squeaky-clean•6mo ago

> whose value (the dependency) will be set by the dependency injection framework

I agree with your definition except for this part, you don't need any framework to do dependency injection. It's simply the idea that instead of having an abstract base class CriticalErrorLogger, with the concrete implementations of StdOutCriticalErrorLogger, FileCriticalErrorLogger, AwsCloudwatchCriticalErrorLogger which bake their dependency into the class design; you instead have a concrete class CriticalErrorLogger(dep *dependency) and create dependency objects externally that implement identical interfaces in different ways. You do text formatting, generating a traceback, etc, and then call dep.write(myFormattedLogString), and the dependency handles whatever that means.

I agree with you that most DI frameworks are too clever and hide too much, and some forms of DI like setter injection and reflection based injection are instant spaghetti code generators. But things like Constructor Injection or Method Injection are so simple they often feel obvious and not like Dependency Injection even though they are. I love DI, but I hate DI frameworks; I've never seen a benefit except for retrofitting legacy code with DI.

And yeah it does add the issue or lifetime management. That's an easy place to F things up in your code using DI and requires careful thought in some circumstances. I can't argue against that.

But DI doesn't need frameworks or magic methods or attributes to work. And there's a lot of situations where DI reduces code duplication, makes refactoring and testing easier, and actually makes code feel less magical than using internal dependencies.

The basic principle is much simpler than most DI frameworks make it seem. Instead of initializing a dependency internally, receive the dependency in some way. It can be through overly abstracted layers or magic methods, but it can also be as simple as adding an argument to the constructor or a given method that takes a reference to the dependency and uses that.

edit: made some examples less ambiguous

layer8•6mo ago

The pattern you are describing is what I know as the Strategy pattern [0]. See the example there with the Car class that takes a BrakeBehavior as a constructor parameter [1]. I have no issue with that and use it regularly. The Strategy pattern precedes the notion of dependency injection by around ten years.

The term Dependency Injection was coined by Martin Fowler with this article: https://martinfowler.com/articles/injection.html. See how it presents the examples in terms of wiring up components from a configuration, and how it concludes with stressing the importance of "the principle of separating service configuration from the use of services within an application". The article also presents constructor injection as only one of several forms of dependency injection.

That is how everyone understood dependency injection when it became popular 10-20 years ago: A way to customize behavior at the top application/deployment level by configuration, without having to pass arguments around throughout half the code base to the final object that uses them.

Apparently there has been a divergence of how the term is being understood.

[0] https://en.wikipedia.org/wiki/Strategy_pattern

[1] The fact that Car is abstract in the example is immaterial to the pattern, and a bit unfortunate in the Wikipedia article, from a didactic point of view.

squeaky-clean•6mo ago

They're not really exclusive ideas. The Constructor Injection section in Fowler's article is exactly the same as the Strategy pattern. But no one talks about the Strategy pattern anymore, it's all wrapped into the idea of DI and that's what caught on.

morsecodist•6mo ago

It was interesting reading this exchange. I have a similar understanding of DI to you. I have never even heard of a DI framework and I have trouble picturing what it would look like. It was interesting to watch you two converge on where the disconnect was.

rjbwork•6mo ago

Usually when people refer to "DI Frameworks" they're referring to Inversion of Control (IoC) containers.

layer8•6mo ago

I'm curious, which language/dev communities did you pick this up from? Because I don't think it's universal, certainly not in the Java world.

DI in Java is almost completely disconnected from what the Strategy pattern is, so it doesn't make sense to use one to refer to the other there.

naasking•6mo ago

How is the configuration hidden? Presumably you configured the DI container.

kortilla•5mo ago

> dependency injection is a programming technique in which an object or function receives other objects or functions that it requires, as opposed to creating them internally

rjbwork•6mo ago

Dependency injection is just passing your dependencies in as constructor arguments rather than as hidden dependencies that the class itself creates and manages.

It's equivalent to partial application.

An uninstantiated class that follows the dependency injection pattern is equivalent to a family of functions with N+Mk arguments, where Mk is the number of parameters in method k.

Upon instantiation by passing constructor arguments, you've created a family of functions each with a distinct sets of Mk parameters, and N arguments in common.

theteapot•6mo ago

> Dependency injection is just passing your dependencies in as constructor arguments rather than as hidden dependencies that the class itself creates and manages.

That's the best way to think of it fundamentally. But the main implication of that which is at some point something has to know how to resolve those dependencies - i.e. they can't just be constructed and then injected from magic land. So global cradles/resolvers/containers/injectors/providers (depending on your language and framework) are also typically part and parcel of DI, and that can have some big implications on the structure of your code that some people don't like. Also you can inject functions and methods not just constructors.

rjbwork•6mo ago

That's because those containers are convenient to use. If you don't like using them, you can configure the entire application statically from your program's entry point if you prefer.

layer8•6mo ago

I don't understand what you're describing has to do with dependency injection. See https://news.ycombinator.com/item?id=43740196.

KronisLV•6mo ago

> Dependency injection is just passing your dependencies in as constructor arguments rather than as hidden dependencies that the class itself creates and manages.

This is all well and good, but you also need a bunch of code that handles resolving those dependencies, which oftentimes ends up being complex and hard to debug and will also cause runtime errors instead of compile time errors, which I find to be more or less unacceptable.

Edit: to elaborate on this, I’ve seen DI frameworks not be used in “enterprise” projects a grand total of zero times. I’ve done DI directly in personal projects and it was fine, but in most cases you don’t get to make that choice.

Just last week, when working on a Java project that’s been around for a decade or so, there were issues after migrating it from Spring to Spring Boot - when compiled through the IDE and with the configuration to allow lazy dependency resolution it would work (too many circular dependencies to change the code instead), but when built within a container by Maven that same exact code and configuration would no longer work and injection would fail.

I’m hoping it’s not one of those weird JDK platform bugs but rather an issue with how the codebase is compiled during the container image build, but the issue is mind boggling. More fun, if you take the .jar that’s built in the IDE and put it in the container, then everything works, otherwise it doesn’t. No compilation warnings, most of the startup is fine, but if you build it in the container, you get a DI runtime error about no lazy resolution being enabled even if you hardcode the setting to be on in Java code: https://docs.spring.io/spring-boot/api/kotlin/spring-boot-pr...

I’ve also seen similar issues before containers, where locally it would run on Jetty and use Tomcat on server environments, leading to everything compiling and working locally but throwing injection errors on the server.

What’s more, it’s not like you can (easily) put a breakpoint on whatever is trying to inject the dependencies - after years of Java and Spring I grow more and more convinced that anything that doesn’t generate code that you can inspect directly (e.g. how you can look at a generated MapStruct mapper implementation) is somewhat user hostile and will complicate things. At least modern Spring Boot is good in that more of the configuration is just code, because otherwise good luck debugging why some XML configuration is acting weird.

In other words, DI can make things more messy due to a bunch of technical factors around how it’s implemented (also good luck reading those stack traces), albeit even in the case of Java something like Dagger feels more sane https://dagger.dev/ despite never really catching on.

Of course, one could say that circular dependencies or configuration issues are project specific, but given enough time and projects you will almost inevitably get those sorts of headaches. So while the theory of DI is nice, you can’t just have the theory without practice.

vbezhenar•6mo ago

Dependency injection is not hidden. It's quite the opposite: dependency injection lists explicitly all the dependencies in a well defined place.

Hidden dependencies are: untyped context variable; global "service registry", etc. Those are hidden, the only way to find out which dependencies given module has is to carefully read its code and code of all called functions.

hliyan•6mo ago

Inclined to agree. Consider that a singleton dependency is essentially a global, and differs from a traditional global, only in that the reference is kept in a container and supplied magically via a constructor variable. Also consider that constructor calls are now outside the application layer frames of the callstack, in case you want to trace execution.

rapind•6mo ago

It starts off feeling like a superpower allowing to to change a system's behaviour without changing its code directly. It quickly devolves into a maintenance nightmare though every time I've encountered it.

I'm talking more specifically about Aspect Oriented Programming though and DI containers in OOP, which seemed pretty clever in theory, but have a lot of issues in reality.

I take no issues with currying in functional programming.

rjbwork•6mo ago

In terms of aspects I try to keep it limited to already existing framework touch points for things like logging, authentication and configuration loading. I find that writing middleware that you control with declarative attributes can be good for those use cases.

There are other good uses of it but it absolutely can get out of control, especially if implemented by someone whose just discovered it and wants to use it for everything.

ironSkillet•6mo ago

I have found that the dependency injection pattern makes it far easier to write clean tests for my code.

ryandrake•6mo ago

I’m constantly amazed at how careless developers are with pulling 3rd party libraries into their code. Have you audited this code? Do you know everything it does? Do you know what security vulnerabilities exist in it? On what basis do you trust it to do what it says it is doing and nothing else?

But nobody seems to do this diligence. It’s just “we are in a rush. we need X. dependency does X. let’s use X.” and that’s it!

ClumsyPilot•6mo ago

> Have you audited this code?

Wrong question. “Are you paid to audit this code?” And “if you fail to audit this code, who’se problem is it?”

ryandrake•6mo ago

I think developers are paid to competently deliver software to their employer, and part of that competence is properly vetting the code you are delivering. If I wrote code that ended up having serious bugs like crashing, I’d expect to have at least a minimum consequence, like root causing it and/or writing a postmortem to help avoid it in the future. Same as I’d expect if I pulled in a bad dependency.

baumy•6mo ago

Your expectations do not match the employment market as I have ever experienced it.

Have you ever worked anywhere that said "go ahead and slow down on delivering product features that drive business value so you can audit the code of your dependencies, that's fine, we'll wait"?

I haven't.

ryandrake•6mo ago

Yea, and that’s the problem. If such absolute rock bottom minimal expectations (know what the code does) are seen as too slow and onerous, the industry is cooked!

ClumsyPilot•6mo ago

Yeah, about that, businesses are pushing and introducing code written by AI/LLM now, so now you won't even know what your own code does.

djeastm•6mo ago

Due diligence is a sliding scale. Work at a webdev agency is "get it done as fast as possible for this MVP we need". Work at NASA or a biomedical device company? Every line of code is triple-checked. It's entirely dependent on the cost/benefit analysis.

Funes-•6mo ago

"who'se" is wild.

SoftTalker•6mo ago

If a car manufacturer sources a part from a third party, and that part has a serious safety problem, who will the customer blame? And who will be responsible for the recall and the repairs?

ClumsyPilot•6mo ago

But we aren’t car business, am we are in joker business.

When was the last time producer of an app was held legally accountable for negligence, had to pay compensation and damages, etc?

vinnymac•6mo ago

This is especially true for script kiddies, which is why I am so thankful for https://e18e.dev/

AI is making this worse than ever though, I am constantly having to tell devs that their work is failing to meet requirements, because AI is just as bad as a junior dev when it comes to reaching for a dependency. It’s like we need training wheels for the prompts juniors are allowed to write.

zzo38computer•6mo ago

I agree that there are things with too many dependencies and I try to avoid that. I think it is a good idea to minimize how many dependencies are needed (even indirect dependencies; however, in some cases a dependency is not a specific implementation, and in that case indirect dependencies are less of a problem, although having a good implementation with less indirect dependencies is still beneficial). I may write my own, in many cases. However, another reason for writing my own is because of other kind of problems in the existing programs. Not all problems are malicious; many are just that they do not do what I need, or do too much more than what I need, or both. (However, most of my stuff is C rather than JavaScript; the problem seems to be more severe with JavaScript, but I do not use that much.)

bloppe•6mo ago

These are kind of separate issues. Apps using Infatica know that they're selling access to their users' bandwidth. It's intentional.

jonplackett•6mo ago

How is this not just illegal? Surely there’s something in GDPR that makes this not allowed.

Retr0id•6mo ago

iiuc, they do actually ask the user for permission

fc417fc802•6mo ago

Which is ironic considering that I strongly disagree with one of the primary walled garden justifications, used particularly in the case of Apple, which amounts to "the end user is too stupid to decide on his own". Unfortunately, even if I disagree with it as a guiding principle sometimes that statement proves true.

klabb3•6mo ago

It’s not about stupidity, but practicality. People can’t give informed consent for 100 ToS for different companies, and keep those up to date. That’s why there are laws.

SoftTalker•6mo ago

No doubt in a dense wall of text that the user must accept to use the application, or worse is deemed to have accepted by using the application at all.

zahlman•6mo ago

> I am now of the opinion that every form of web-scraping should be considered abusive behaviour and web servers should block all of them. If you think your web-scraping is acceptable behaviour, you can thank these shady companies and the “AI” hype for moving you to the bad corner.

I imagine that e.g. Youtube would be happy to agree with this. Not that it would turn them against AI generally.

BlueTemplar•6mo ago

Yeah, also this means the death of archival efforts like the Internet Archive.

jeroenhd•6mo ago

Welcome scrapers (IA, maybe Google and Bing) can publish their IP addresses and get whitelisted. Websites that want to prevent being on the Internet Archive can pretty much just ask for their website to be excluded (even retroactively).

[Cloudflare](https://developers.cloudflare.com/cache/troubleshooting/alwa...) tags the internet archive as operating from 207.241.224.0/20 and 208.70.24.0/21 so disabling the bot-prevention framework on connections from there should be enough.

trinsic2•6mo ago

This sounds like it would be a good idea. Create a whitelist of IPs and block the rest.

realusername•6mo ago

That's basically asking to close the market in favor of the current actors.

New actors have the right to emerge.

0dayz•6mo ago

No they don't.

There's no rule that you have to let anyone in who claims to be a web crawler.

areyourllySorry•6mo ago

which is why they will stop claiming to be one.

chii•6mo ago

so what happened to competition fostering a better outcome for all then?

realusername•6mo ago

So who decides that you can be one? Right now it's Cloudflare, a litteral monopoly...

The truth is that I sympathize with the people trying to use mobile connections to bypass such a cartel.

What Cloudflare is doing now is worse than the web crawlers themselves and the legality of blocking crawlers with a monopoly is dubious at best.

jeroenhd•6mo ago

They have the right to try to convince me to let them scrape me. Most of the time they're thinly veiled data traders. I haven't seen any new company try to scrape my stuff since maybe Kagi.

Kagi is welcome to scrape from their IP addresses. Other bots that behave are fine too (Huawei and various other Chinese bots don't and I've had to put an IP block on those).

areyourllySorry•6mo ago

a large chunk of internet archive's snapshots are from archiveteam, where "warriors" bring their own ips (and they crawl respectfully!). save page now is important too, but you don't realise what is useful until you lose it.

Centigonal•6mo ago

yeah, but you can't, that's the problem. Plenty of service operators would like to block every scraper that doesn't obey their robots.txt, but there's no good way to do that without blocking human traffic too (Anubis et al are okay, but they are half-measures).

On a separate note, I believe open web scraping has been a massive benefit to the internet on net, and almost entirely positive pre-2021. Web scraping & crawling enables search engines, services like Internet Archive, walled-garden-busting (like Invidious, yt-dlp, and Nitter), mashups (Spotube, IFTT, and Plaid would have been impossible to bootstrap without web scraping), and all kinds of interesting data science projects (e.g. scraping COVID-19 stats from local health departments to patch together a picture of viral spread for epidemiologists).

udev4096•6mo ago

We should have a way to verify the user-agents of the valid and useful scrapers such as Internet Archive by having some kind of cryptographic signature of their user-agents and being able to validate it with any reverse proxy seems like a good start

nottorp•6mo ago

Self signed, I hope.

Or do you want a central authority that decides who can do new search engines?

udev4096•6mo ago

Using DANE is probably the best idea even though it's still not mainstream

lelanthran•6mo ago

> Plenty of service operators would like to block every scraper that doesn't obey their robots.txt, but there's no good way to do that without blocking human traffic too (Anubis et al are okay, but they are half-measures)

Why is Anubis-type mitigations a half-measure?

Centigonal•6mo ago

Anubis, go-away, etc are great, don't get me wrong -- but what Anubis does is impose a cost on every query. The website operator is hoping that the compute will have a rate-limiting effect on scrapers while minimally impacting the user experience. It's almost like chemotherapy, in that you're poisoning everyone in the hope that the aggressive bad actors will be more severely affected than the less aggressive good actors. Even the Anubis readme calls it a nuclear option. In practice it appears to work pretty well, which is great!

It's a half-measure because:

1. You're slowing down scrapers, not blocking them. They will still scrape your site content in violation of robots.txt.

2. Scrapers with more compute than IP proxies will not be significantly bottlenecked by this.

3. This may lead to an arms race where AI companies respond by beefing up their scraping infrastructure, necessitating more difficult PoW challenges, and so on. The end result of this hypothetical would be a more inconvenient and inefficient internet for everyone, including human users.

To be clear: I think Anubis is a great tool for website operators, and one of the best self-hostable options available today. However, it's a workaround for the core problem that we can't reliably distinguish traffic from badly behaving AI scrapers from legitimate user traffic.

pton_xd•6mo ago

I thought the closed-garden app stores were supposed to protect us from this sort of thing?

whstl•6mo ago

Once again this demonstrate that closed gardens only benefit the owners of the garden, and not the users.

What good is all the app vetting and sandbox protection in iOS (dunno about Android) if it doesn't really protect me from those crappy apps...

20after4•6mo ago

At the very least, Apple should require conspicuous disclosure of this kind of behavior that isn't just hidden in the TOS.

BlueTemplar•6mo ago

Also my reaction when the call is for Google, Apple, Microsoft to fix this : DDOS being illegal, shouldn't the first reaction instead to be to contact law enforcement ?

If you treat platforms like they are all-powerful, then that's what they are likely to become...

musicale•6mo ago

Sandboxing means you can limit network access. For example, on Android you can disallow wi-fi and cellular access (not sure about bluetooth) on a per-app basis.

Network access settings should really be more granular for apps that have a legitimate need.

App store disclosure labels should also add network usage disclosure.

20after4•6mo ago

That's what they want you to think.

kibwen•6mo ago

If you find yourself in a walled garden, understand that you're the crop being grown and harvested.

jt2190•6mo ago

I’m really struggling to understand how this is different than malware we’ve had forever. Can someone explain what’s novel about this?

desertmonad•6mo ago

That its not being treated like malware.

jt2190•6mo ago

In the sense that people are voluntarily installing and running this malware on their computers, rather than being tricked into running it? Is that the only difference?

int_19h•6mo ago

They are still tricked into running it, since it's normally not an advertised "feature" of any app that uses such SDKs.

downrightmike•6mo ago

I think it is funny that the mobile OS is trying to be as secure as possible, but then they allow this to run on top

rsedgwick•6mo ago

I think tech can still be beautiful in a less grandiose and "omniparadisical" way than people used to dream of. "A wide open internet, free as in speech this, free as in beer that, open source wonders, open gardens..." Well, there are a lot of incentives that fight that, and game theory wins. Maybe we download software dependencies from our friends, the ones we actually trust. Maybe we write more code ourselves--more homesteading families that raise their own chickens, jar their own pickled carrots, and code their own networking utilities. Maybe we operate on servers we own, or our friends own, and we don't get blindsided by news that the platforms are selling our data and scraping it for training.

Maybe it's less convenient and more expensive and onerous. Do good things require hard work? Or did we expect everyone to ignore incentives forever while the trillion-dollar hyperscalers fought for an open and noble internet and then wrapped it in affordable consumer products to our delight?

It reminds me of the post here a few weeks ago about how Netflix used to be good and "maybe I want a faster horse" - we want things to be built for us, easily, cheaply, conveniently, by companies, and we want those companies not to succumb to enshittification - but somehow when the companies just follow the game theory and turn everything into a TikToky neural-networks-maximizing-engagement-infinite-scroll-experience, it's their fault, and not ours for going with the easy path while hoping the corporations would not take the easy path.

reconnecting•6mo ago

Residential IP proxies have some weaknesses. One is that they ofter change IP addresses during a single web session. Second, if IP come from the same proxies provider, they are often concentrated within a sing ASN, making them easier to detect.

We are working on an open‑source fraud prevention platform [1], and detecting fake users coming from residential proxies is one of its use cases.

[1] https://www.github.com/tirrenotechnologies/tirreno

gbcfghhjj•6mo ago

At least here in the US most residential ISPs have long leases and change infrequently, weeks or months.

Trying to understand your product, where is it intended to sit in a network? Is it a standalone tool that you use to identify these IPs and feed into something else for blockage or is it intended to be integrated into your existing site or is it supposed to proxy all your web traffic? The reason I ask is it has fairly heavyweight install requirements and Apache and PHP are kind of old school at this point, especially for new projects and companies. It's not what they would commonly be using for their site.

reconnecting•6mo ago

Indeed, if it's a real user from a residential IP address, in most cases it will be the same network. However, if it's a proxy from residential IPs, there could be 10 requests from one network, the 11th request from a second network, and the 12th request back from the same network. This is a red flag.

Thank you for your question. tirreno is a standalone app that needs to receive API events from your main web application. It can work perfectly with 512GB Postgres RAM or even lower, however, in most cases we're talking about millions of events that request resources.

It's much easier to write a stable application without dependencies based on mature technologies. tirreno is fairly 'boring software'.

sroussey•6mo ago

My phone will be on the home network until I walk out of the house and then it will change networks. This should not be a red flag.

reconnecting•6mo ago

Effective fraud prevention relies on both the full user context and the behavioral patterns of known online fraudsters. The key idea is that an IP address cannot be used as a red flag on its own without considering the broader context of the account. However, if we know that the fraudsters we're dealing with are using mobile networks proxies and are randomly switching between two mobile operators, that is certainly a strong risk signal.

JimDabell•6mo ago

An awful lot of free Wi-Fi networks you find in malls are operated by different providers. Walking from one side of a mall to the other while my phone connects to all the Wi-Fi networks I’ve used previously would have you flag me as a fraudster if I understand your approach correctly.

reconnecting•6mo ago

We are discussing user behavior in the context of a web system. The fact that your device has connected to different Wi-Fi networks doesn't necessarily mean that all of them were used to access the web application.

Finally, as mentioned earlier, there is no silver bullet that works for every type of online fraudster. For example, in some applications, a TOR connection might be considered a red flag. However, if we are talking about hn visitors, many of them use TOR on a daily basis.

sroussey•6mo ago

I’ve done a bit of anti-fraud myself and it needs a lack of privacy to work well. Well fingerprinted == less fraud. Sigh.

I’ve found TOR browsing ok, but login via TOR to just be a great alternative to snow shoeing credential stuffing.

andelink•6mo ago

The first blog post in this series[1], linked to at the top of TFA, offers an analysis on the potential of using ASNs to detect such traffic. Their conclusion was that ASNs are not helpful for this use-case, showing that across the 50k IPs they've blocked, there is less than 4 IP addresses per ASN, on average.

[1] https://jan.wildeboer.net/2025/02/Blocking-Stealthy-Botnets/

reconnecting•6mo ago

What was done manually in the first blog is exactly what tirreno helps to achieve by analyzing traffic, here is live example [1]. Blocking an entire ASN should not be considered a strategy when real users are involved.

Regarding the first post, it's rare to see both datacenter network IPs and mobile proxy IP addresses used simultaneously. This suggests the involvement of more than one botnet. The main idea is to avoid using IP addresses as the sole risk factor. Instead, they should be considered as just one part of the broader picture of user behavior.

[1] https://play.tirreno.com

gruez•6mo ago

>One is that they ofter change IP addresses during a single web session. Second, if IP come from the same proxies provider, they are often concentrated within a sing ASN, making them easier to detect.

Both are pretty easy to mitigate with a geoip database and some smart routing. One "residential proxy" vendor even has session tokens so your source IP doesn't randomly jump between each request.

reconnecting•6mo ago

And this is the exact reason why IP addresses cannot be considered as the one and only signal for fraud prevention.

at0mic22•6mo ago

Strange the HolaVPN e.g. Brightdata is not mentioned. They've been using user hosts for those purposes for decades, and also selling proxies en masse. Fun fact they don't have any servers for the VPN. All the VPN traffic is routed through ... other users!

arewethereyeta•6mo ago

They are even the first to do it and the most litigious of all. Trying to push patents on everything possible, even on water if they can.

Klonoar•6mo ago

Is it really strange if the logo is right there in the article?

andelink•6mo ago

Hola is mentioned in the authors prior post on this topic, linked to at the top of TFA: https://jan.wildeboer.net/2025/02/Blocking-Stealthy-Botnets/

armchairhacker•6mo ago

Why jump to that conclusion?

If a scraper clearly advertises itself, follows robots.txt, and has reasonable backoff, it's not abusive. You can easily block such a scraper, but then you're encouraging stealth scrapers because they're still getting your data.

I'd block the scrapers that try to hide and waste compute, but deliberately allow those that don't. And maybe provide a sitemap and API (which besides being easier to scrape, can be faster to handle).

panstromek•6mo ago

I'd expect this to be against app store and google play rules, they are very picky.

Pesthuf•6mo ago

We need a list of apps that include these libraries and any malware scanner - including Windows Defender, Play Protect and whatever Apple calls theirs - need to put infected applications into quarantine immediately. Just because it's not directly causing damage to the device running the malware is running on, that doesn't mean it's not malware.

philippta•6mo ago

Apps should be required to ask for permission to access specific domains. Similar to the tracking protection, Apple introduced a while ago.

Not sure how this could work for browsers, but the other 99% of apps I have on my phone should work fine with just a single permitted domain.

jay_kyburz•6mo ago

Oh, that's an interesting idea. A local DNS where I have to add every entry. A white list rather than Australia's national blacklist.

snackernews•6mo ago

My iPhone occasionally displays an interrupt screen to remind me that my weather app has been accessing my location in the background and to confirm continued access.

It should also do something similar for apps making chatty background requests to domains not specified at app review time. The legitimate use cases for that behaviour are few.

zzo38computer•6mo ago

I think capability based security with proxy capabilities is the way to do it, and this would make it possible for the proxy capability to intercept the request and ask permission, or to do whatever else you want it to do (e.g. redirections, log any accesses, automatically allow or disallow based on a file, use or ignore the DNS cache, etc).

The system may have some such functions built in, and asking permission might be a reasonable thing to include by default.

XorNot•6mo ago

Try actually using a system like this. OpenSnitch and LittleSnitch do it for Linux and MacOS respectively. Fedora has a pretty good interface for SELinux denials.

I've used all of them, and it's a deluge: it is too much information to reasonably react to.

Your broad is either deny or accept but there's no sane way to reliably know what you should do.

This is not and cannot be an individual problem: the easy part is building high fidelity access control, the hard part is making useful policy for it.

zzo38computer•6mo ago

I suggested proxy capabilities, that it can easily be reprogrammed and reconfigured; if you want to disable this feature then you can do that too. It is not only allow or deny; other things are also possible (e.g. simulate various error conditions, artificially slow down the connection, go through a proxy server, etc). (This proxy capability system would be useful for stuff other than network connections too.)

> it is too much information to reasonably react to.

Even if it asks, does not necessarily mean it has to ask every time if the user lets it keep the answer (either for the current session for until the user deliberately deletes this data). Also, if it asks too much because it tries to access too many remote servers, then might be spyware, malware, etc anyways, and is worth investigating in case that is what it is.

> the hard part is making useful policy for it.

What the default settings should be is a significant issue. However, changing the policies in individual cases for different uses, is also something that a user might do, since the default settings will not always be suitable.

If whoever manages the package repository, app store, etc is able to check for malware, then this is a good thing to do (although it should not prohibit the user from installing their own software and modifying the existing software), but security on the computer is also helpful, and neither of these is the substitute for the other; they are together.

tzury•6mo ago

Vast majority of revenues in the mobile apps ecosystem are ads, which by design pulled from 3rd parties (and are part of the broader problem discussed in this post).

I am waiting for Apple to enable /etc/hosts or something similar on iOS devices.

klabb3•6mo ago

On the one hand, yes this could work for many cases. On the other hand, good bye p2p. Not every app is a passive client-server request-response. One needs to be really careful with designing permission systems. Apple has already killed many markets before they had a chance to even exist, such as companion apps for watches and other peripherals.

kmeisthax•6mo ago

P2P was practically dead on iPhone even back in 2010. The whole "don't burn the user's battery" thing precludes mobile phones doing anything with P2P other than leeching off of it. The only exceptions are things like AirDrop; i.e. locally peer-to-peer things that are only active when in use and don't try to form an overlay or mesh network that would require the phone to become a router.

And, AFAIK, you already need special permission for anything other than HTTPS to specific domains on the public Internet. That's why apps ping you about permissions to access "local devices".

zzo38computer•6mo ago

> other than HTTPS to specific domains on the public Internet

They should need special permission for that too.

Pesthuf•6mo ago

Maybe there could be a special entitlement that Apple's reviewers would only grant to applications that have a legitimate reason to require such connections. Then only applications granted that permission would be able to make requests to arbitrary domains / IP addresses.

That's how it works with other permissions most applications should not have access to, like accessing user locations. (And private entitlements third party applications can't have are one way Apple makes sure nobody can compete with their apps, but that's a separate issue.)

nottorp•6mo ago

> On the other hand, good bye p2p.

You mean, good bye using my bandwidth without my permission? That's good. And if I install a bittorrent client on my phone, I'll know to give it permission.

> such as companion apps for watches and other peripherals

That's just apple abusing their market position in phones to push their watch. What does it have to do with p2p?

klabb3•6mo ago

> using my bandwidth without my permission

What are you talking about?

> What does it have to do with p2p?

It’s an example of when you design sandboxes/firewalls it’s very easy to assume all apps are one big homogenous blob doing rest calls and everything else is malicious or suspicious. You often need strange permissions to do interesting things. Apple gives themselves these perms all the time.

nottorp•6mo ago

Wait, why should applications be allowed to do rest calls by default?

> What are you talking about?

That’s the main use case for p2p in an application isn’t it? Reducing the vendors bandwidth bill…

klabb3•6mo ago

> That’s the main use case for p2p in an application isn’t it? Reducing the vendors bandwidth bill…

The equivalent would be to say that running local workloads or compute is to reduce the vendors bill. It’s a very centralized view of the internet.

There are many reasons to do p2p. Such as improving bandwidth and latency, circumventing censorship, improve resilience and more. WebRTC is a good example of p2p used by small and large companies alike. None of this is any more ”without permission” than a standard app phoning home and tracking your fingerprint and IP.

nottorp•6mo ago

Oh, funny you should pick WebRTC. Back when I was still using Chrome, it prevented my desktop from sleeping because 'WebRTC has active peer connections'. With no indication on which page that is happening.

Great respect for the user's resources.

klabb3•6mo ago

Haha yeah I personally hate WebRTC. It’s a mess and I’ve literally rewritten the parts of it I need in order to avoid it. (Check my profile)

I just brought it up as a technology that at the very least is both legitimate and common.

vbezhenar•6mo ago

Do you suggest to outright forbid TCP connections for user software? Because you can compile OpenSSL or any other TLS library and do a TCP connection to port 443 which will be opaque for operating system. They can do wild things like kernel-level DPI for outgoing connections to find out host, but that quickly turns into ridiculous competition.

internetter•6mo ago

> but that quickly turns into ridiculous competition.

Except the platform providers hold the trump card. Fuck around, if they figure it out you'll be finding out.

udev4096•6mo ago

Android is so fucking anti-privacy that they still don't have an INTERNET access revoke toggle. The one they have currently is broken and can easily be bypassed with google play services (another highly privileged process running for no reason other than to sell your soul to google). GrapheneOS has this toggle luckily. Whenever you install an app, you can revoke the INTERNET access at the install screen and there is no way that app can bypass it

mjmas•6mo ago

Asus added this to their phones which is nice.

proxy_err•6mo ago

Its a fair point but very dynamic to sort out. This needs a full research team to figure out. Or you know.. all of us combined!! It is definitely a problem.

TINFOIL: Sometimes I always wondered if Azure or AWS used bots to push site traffic hits to generate money... they know you are hosted with them.. They have your info.. Send out bots to drive micro accumulation. Slow boil..

luckylion•6mo ago

I think that's mostly that they don't care about having malicious bots on their networks as long as they pay.

GCE is rare in my experience. Most bots I see are on AWS. The DDOS-adjacent hyper aggressive bots that try random URLs and scan for exploits tend to be on Azure or use VPNs.

AWS is bad when you report malicious traffic. Azure has been completely unresponsive and didn't react, even for C&C servers.

aucisson_masque•6mo ago

It's interesting but so far there is no definitive proof it's happening.

People are jumping to conclusions a bit fast over here, yes technically it's possible but this kind of behavior would be relatively easy to spot because the app would have to make direct connections to the website it wants to scrap.

Your calculator app for instance connecting to CNN.com ...

iOS have app privacy report where one can check what connections are made by app, how often, last one, etc.

Android by Google doesn't have such a useful feature of course, but you can run third party firewall like pcapdroid, which I recommend highly.

Macos (little snitch).

Windows (fort firewall).

Not everyone run these app obviously, only the most nerdy like myself but we're also the kind of people who would report on app using our device to make, what is in fact, a zombie or bot network.

I'm not saying it's necessarily false but imo it remains a theory until proven otherwise.

CharlesW•6mo ago

Botnets as a Service are absolutely happening, but as you allude to, the scope of the abuse is very different on iOS than, say, Windows.

abaymado•6mo ago

> iOS have app privacy report where one can check what connections are made by app, how often, last one, etc.

How often is the average calculator app user checking there Privacy Report? My guess, not many!

gruez•6mo ago

All it takes is one person to find out and raise the alarm. The average user doesn't read the source code behind openssl or whatever either, that doesn't mean there's no gains in open sourcing it.

dewey•6mo ago

The average user is also not reading these raised “alarms”. And if an app has a bad name, another one will show up with a different name on the same day.

aucisson_masque•6mo ago

You're on a tech forum, you must have seen one of the many post about app, either on Android or iPhone, that acts like spyware.

They happens from time to time, last one was not more than two week ago where it's been shown that many app were able to read the list of all other app installed on a Android and that Google refused to fix that.

Do you really believe that an app used to make your device part of a bot network wouldn't be posted over here ?

dewey•6mo ago

"You're on a tech forum", that's exactly the point. The "average user" is not on a tech forum though, the average user opens the app store of their platform, types "calculator" and installs the first one that's free.

nottorp•6mo ago

The real solution is to add a permission for network access, with the default set to deny.

throwaway519•6mo ago

Given 5his is a thing even in browser plugins, and that so very few people analyse their firewalls, I'd not discount it at all. Much of the world's users hve no clue and app stores are notoriously bad at reacting even with publicsed malware e.g. 'free' VPNs in iOS Store.

andelink•6mo ago

This is a hilariously optimistic, naive, disconnected from reality take. What sort of "proof" would be sufficient for you? TFA includes of course data from the authors own server logs^, but it also references real SDKs and business selling this exact product. You can view the pricing page yourself, right next to stats on how many IPs are available for you to exploit. What else do you need to see?

^ edit: my mistake, the server logs I mentioned were from the authors prior blog post on this topic, linked to at the top of TFA: https://jan.wildeboer.net/2025/02/Blocking-Stealthy-Botnets/

jshier•6mo ago

> iOS have app privacy report where one can check what connections are made by app, how often, last one, etc.

Privacy reports do not include that information. They include broad areas of information the app claims to gather. There is zero connection between those claimed areas and what the app actually does unless app review notices something that doesn't match up. But none of that information is updated dynamically, and it has never actually included the domains the app connects to. You may be confusing it with the old domain declarations for less secure HTTP connections. Once the connections met the system standards you no longer needed to declare it.

zargon•6mo ago

I wasn't aware of this feature. But apparently it does include that information. I just enabled it and can see the domains that apps connect to. https://support.apple.com/en-us/102188

hoc•6mo ago

Pretty neat, actually. Thanks for looking uo that link.

Galanwe•6mo ago

There is already a lot of proof. Just ask for a sales pitch from companies selling these data and they will gladly explain everything to you.

Go to a data conference like Neudata and you will see. You can have scraped data from user devices, real-time locations, credit card, Google analytics, etc.

badmonster•6mo ago

do you think there’s a realistic path forward for better transparency or detection—maybe at the OS level or through network-level anomaly detection?

yungporko•6mo ago

it's funny, i've never heard of or thought about the possibility of this happening but actually in hindsight it seems almost too obvious to not be a thing.

jeroenhd•6mo ago

> So there is a (IMHO) shady market out there that gives app developers on iOS, Android, MacOS and Windows money for including a library into their apps that sells users network bandwidth

AKA "why do Cloudflare and Google make me fill out these CAPTCHAs all day"

I don't know why Play Protect/MS Defender/whatever Apple has for antivirus don't classify apps that embed such malware as such. It's ridiculous that this is allowed to go on when detection is so easy. I don't know a more obvious example of a trojan than an SDK library making a user's device part of a botnet.

dx4100•6mo ago

Cloudflare and Google use CAPTCHAs to sell web scrapers? I don't get your point. I was under the impression the data is used to train models.

cuu508•6mo ago

Trojans in your mobile apps ruin your IP's reputation which comes back to you in the form of frequent, annoying CAPTCHAs.

aloha2436•6mo ago

The implication is that the users that are being constantly presented with CAPTCHAs are experiencing that because they are unwittingly proxying scrapers through their devices via malicious apps they've installed.

pentae•6mo ago

.. or that other people on their network/Shared public IP have installed

evgpbfhnr•6mo ago

or just that they don't run windows/mac OS with chome like everyone else and it's "suspicious". I get cloudflare capchas all the time with firefox on linux... (and I'm pretty sure there's no such app in my home network!)

Doxin•5mo ago

FWIW I run firefox on linux too, and I don't have any trouble with cloudflare captchas. I get them every now and then but definitely not all the time.

jeroenhd•6mo ago

When a random device on your network gets infected with crap like this, your network becomes a bot egress point, and anti bot networks respond appropriately. Cloudflare, Akamai, even Google will start showing CAPTCHAs for every website they protect when your network starts hitting random servers with scrapers or DDoS attacks.

This is even worse with CG-NAT if you don't have IPv6 to solve the CG-NAT problem.

I don't think the data they collect is used to train anything these days. Cloudflare is using AI generated images for CAPTCHAs and Google's actual CAPTCHAs are easier for bots than humans at this point (it's the passive monitoring that makes it still work a little bit).

areyourllySorry•6mo ago

it's not technically malware, you agreed to it when you accepted the terms of service :^)

L-four•6mo ago

It's malware it does something malicious.

panny•6mo ago

>Apple, Microsoft and Google should act.

Do nothing, win.

They are the primary benefactors buying this data since they are the largest AI players.

neilv•6mo ago

Couldn't Apple and Google (and, to a lesser extent, Microsoft) pretty easily shut down almost all the apps that steal bandwidth?

greesil•6mo ago

How would I know if an app on my device was doing this?

wyck•6mo ago

Install a network monitor or go even deeper and sniff packets.

greesil•6mo ago

I feel like this could be automated. Spin up a virtual device on a monitored network. Install one app, click on some stuff for awhile, uninstall and move onto the next. If the app reaches out to a lot of random sites then flag it

Google could do this. I'm sure Apple could as well. Third parties could for a small set of apps

jeroenhd•6mo ago

This is being done by a couple of SDKs, it'd be much easier to just find and flag those SDK files. Finding apps becomes a matter of a single pass scan over the application contents rather than attempting to bypass the VM detection methods malware is packed full of.

matheusmoreira•6mo ago

"Peer-to-business network"! Amazing. uBlock Origin gets rid of this, right?

__MatrixMan__•6mo ago

The broken thing about the web is that in order for data to remain readable, a unique sysadmin somewhere has to keep a server running in the face of an increasingly hostile environment.

If instead we had a content addressed model, we could drop the uniqueness constraint. Then these AI scrapers could be gossiping the data to one another (and incidentally serving it to the rest of us) without placing any burden on the original source.

Having other parties interested in your data should make your life easier (because other parties will host it for you), not harder (because now you need to work extra hard to host it for them).

Timwi•6mo ago

Are there any systems like that, even if experimental?

jevogel•6mo ago

IPFS

alakra•6mo ago

I had high hopes for IPFS, but even it has vectors for abuse.

See https://arxiv.org/abs/1905.11880 [Hydras and IPFS: A Decentralised Playground for Malware]

__MatrixMan__•6mo ago

Can you point me at what you mean? I'm not immediately finding something that indicates that it is not fit for this use case. The fact that bad actors use it to resist those who want to shut them down is, if anything, an endorsement of its durability. There's a bit of overlap between resisting the AI scrapers and resisting the FBI. You can either have a single point of control and a single point of failure, or you can have neither. If you're after something that's both reliable and reliably censorable--I don't think that's in the cards.

That's not to say that it is a ready replacement for the web as we know it. If you have hash-linked everything then you wind up with problems trying to link things together, for instance. Once two pages exist, you can't after-the-fact create a link between them because if you update them to contain that link then their hashes change so now you have to propagate the new hash to people. This makes it difficult to do things like have a comments section at the bottom of a blog post. So you've got to handle metadata like that in some kind of extra layer--a layer which isn't hash linked and which might be susceptible to all the same problems that our current web is--and then the browser can build the page from immutable pieces, but the assembly itself ends up being dynamic (and likely sensitive to the users preference, e.g. dark mode as a browser thing not a page thing).

But I still think you could move maybe 95% of the data into an immutable hash-linked world (think of these as nodes in a graph), the remaining 5% just being tuples of hashes and pubic keys indicating which pages are trusted by which users, which ought to be linked to which others, which are known to be the inputs and output of various functions, and you know... structure stuff (these are our graph's edges).

The edges, being smaller, might be subject to different constraints than the web as we know it. I wouldn't propose that we go all the way to a blockchain where every device caches every edge, but it might be feasible for my devices to store all of the edges for the 5% of the web I care about, and your devices to store the edges for the 5% that you care about... the nodes only being summoned when we actually want to view them. The edges can be updated when our devices contact other devices (based on trust, like you know that device's owner personally) and ask "hey, what's new?"

I've sort of been freestyling on this idea in isolation, probably there's already some projects that scratch this itch. A while back I made a note to check out https://ceramic.network/ in this capacity, but I haven't gotten down to trying it out yet.

XorNot•6mo ago

Except no one wants content addressed data - because if you knew what it was you wanted, then you would already have stored it. The web as we know it is an index - it's a way to discover that data is available and specifically we usually want the latest data that's available.

AI scrapers aren't trying to find things they already know exist, they're trying to discover what they didn't know existed.

akoboldfrying•6mo ago

> because if you knew what it was you wanted, then you would already have stored it.

"Content-addressable" has a broader meaning than what you seem to be thinking of -- roughly speaking, it applies if any function of the data is used as the "address". E.g., git commits are content-addressable by their SHA1 hashes.

__MatrixMan__•6mo ago

But when you do a "git pull" you're not pulling from someplace identified by a hash, but rather a hostname. The learning-about-new-hashes part has to be handled differently.

It's a legit limitation on what content addressing can do, but it's one we can overcome by just not having everything be content addressed. The web we have now is like if you did a `git pull` every time you opened a file.

The web I'm proposing is like how we actually use git--periodically pulling new hashes as a separate action, but spending most of our time browsing content that we already have hashes for.

__MatrixMan__•6mo ago

Yes, for the reasons you describe, you can't be both a useful web-like protocol and also 100% immutable/hash-linked.

But there's a lot middle ground to explore here. Loading a modern web page involves making dozens of requests to a variety of different servers, evaluating some javascript, and then doing it again a few times, potentially moving several Mb of data. The part people want, the thing you don't already know exist, it's hidden behind that rather heavy door. It doesn't have to be that way.

If you already know about one thing (by its cryptographic hash, say) and you want to find out which other hashes it's now associated with--associations that might not have existed yesterday--that's much easier than we've made it. It can be done:

- by moving kB not Mb, we're just talking about a tuple of hashes here, maybe a public key and a signature

- without placing additional burden on whoever authored the first thing, they don't even have to be the ones who published the pair of hashes that your scraper is interested in

Once you have the second hash, you can then reenter immutable-space to get whatever it references. I'm not sure if there's already a protocol for such things, but if not then we can surely make one that's more efficient and durable than what we're doing now.

XorNot•6mo ago

But we already have HEAD requests and etags.

It is entirely possible to serve a fully cached response that says "you already have this". The problem is...people don't implement this well.

__MatrixMan__•6mo ago

People don't implement them well because they're overburdened by all of the different expectations we put on them. It's a problem with how DNS forces us to allocate expertise. As it is, you need some kind of write access on the server whose name shows up in the URL if you want to contribute to it. This is how globally unique names create fragility.

If content were handled independently of server names, anyone who cares to distribute metadata for content they care about can do so. One doesn't need write access, or even to be on the same network partition. You could just publish a link between content A and content B because you know their hashes. Assembling all of this can happen in the browser, subject to the user's configs re: who they trust.

akoboldfrying•6mo ago

Assuming the right incentives can be found to prevent widespread leeching, a distributed content-addressed model indeed solves this problem, but introduces the problem of how to control your own content over time. How do you get rid of a piece of content? How do you modify the content at a given URL?

I know, as far as possible it's a good idea to have content-immutable URLs. But at some point, I need to make www.myexamplebusiness.com show new content. How would that work?

__MatrixMan__•6mo ago

As for how to get rid of a piece of content... I think that one's a lost cause. If the goal is to prevent things that make content unavailable (e.g. AI scrapers) then you end up with a design that prevents things that makes content unavailable (e.g. legitimate deletions). The whole point is that you're not the only one participating in propagating the content, and that comes with trade-offs.

But as for updating, you just format your URLs like so: {my-public-key}/foo/bar

And then you alter the protocol so that the {my-public-key} part resolves to the merkle-root of whatever you most recently published. So people who are interested in your latest content end up with a whole new set of hashes whenever you make an update. In this way, it's not 100% immutable, but the mutable payload stays small (it's just a bunch of hashes) and since it can be verified (presumably there's a signature somewhere) it can be gossiped around and remain available even if your device is not.

You can soft-delete something just by updating whatever pointed to it to not point to it anymore. Eventually most nodes will forget it. But you can't really prevent a node from hanging on to an old copy if they want to. But then again, could you ever do that? Deleting something on on the web has always been a bit of a fiction.

akoboldfrying•6mo ago

> But then again, could you ever do that?

True in the absolute sense, but the effect size is much worse under the kind of content-addressable model you're proposing. Currently, if I download something from you and you later delete that thing, I can still keep my downloaded copy; under your model, if anyone ever downloads that thing from you and you later delete that thing, with high probability I can still acquire it at any later point.

As you say, this is by design, and there are cases where this design makes sense. I think it mostly doesn't for what we currently use the web for.

__MatrixMan__•6mo ago

You could only later get the thing if you grabbed its hash while it was still available. And you could only reliably resolve that hash later if somebody (maybe you) went out of their way to pin the underlying data. Otherwise nodes would forget rather quickly, because why bother keep around unreferenced bits?

It's the same functionality you get with permalinks and sites like archive.org--forgotten unless explicitly remembered by anybody, dynamic unless explicitly a permalink. It's just built into the protocol rather than a feature to be inconsistently implemented over and over by many separate parties.

areyourllySorry•6mo ago

there is no incentive for different companies to share data with each other, or with anyone really (facebook leeching books?)

__MatrixMan__•6mo ago

I figure we'd create that incentive by configuring our devices to only talk to devices controlled by people we trust. If they want the data at all, they have to gain our trust, and if they want that, they have to seed the data. Or you know, whatever else the agreement ends up being. Maybe we make them pay us.

theteapot•6mo ago

Are ad blockers like AdBlock, uBlock effective against these?

areyourllySorry•6mo ago

i don't believe extensions can modify other extensions

156287745637•6mo ago

AI scrapers and "sneaker bots" are just the tip of the iceberg. Why are all these entities concentrated and metastasizing from just a few superhubs? Why do they look, smell and behave like state-level machinery? If you've researched you'll know exactly what I'm talking about.

Unless complicit, tech leaders (Apple Google Microsoft) have a duty to respond swiftly and decisively. This has been going on far too long.

_ink_•6mo ago

How can I detect such behaviour on my devices / in my home network?

gpi•6mo ago

"Infatica is partnered with Bitdefender, a global leader in cybersecurity, to protect our SDK users from malicious web traffic and content, including infected URLs, untrusted web pages, fraudulent and phishing links, and more."

That's not good.

Quarrel•6mo ago

FWIW, Trend Micro wrote up a decent piece on this space in 2023.

It is still a pretty good lay-of-the-land.

https://www.trendmicro.com/vinfo/us/security/news/vulnerabil...

hinkley•6mo ago

When the enshitification initially hit the fan, I had little flashbacks of Phil Zimmerman talking about Web of Trust and amusing myself thinking maybe we need humans proving they're humans to other humans so we know we aren't arguing with LLMs on the internet or letting them scan our websites.

But it just doesn't scale to internet size so I'm fucked if I know how we should fix it. We all have that cousin or dude in our highschool class who would do anything for a bit of money and introducing his 'friend' Paul who is in fact a bot whose owner paid for the lie. And not like enough money to make it a moral dilemma, just drinking money or enough for a new video game. So once you get past about 10,000 people you're pretty much back where we are right now.

akoboldfrying•6mo ago

I think it should be possible to build something that generalises the idea of Web of Trust so that it's more flexible, and less prone to catastrophic breakdown past some scaling limit.

Binary "X trusts Y" statements, plus transitive closure, can lead to long trust paths that we probably shouldn't actually trust the endpoints of. Could we not instead assign probabilities like "X trusts Y 95%", multiply probabilities along paths starting from our own identity, and take the max at each vertex? We could then decide whether to finally trust some Z if its percentage is more than some threshold T%. (Other ways of combining in-edges may be more suitable than max(); it's just a simple and conservative choice.)

Perhaps a variant of backprop could be used to automatically update either (a) all or (b) just our own weights, given new information ("V has been discovered to be fraudulent").

hinkley•6mo ago

True. Perhaps a collective vote past 2 degrees of freedom out where multiple parties need to vouch for the same person before you believe they aren't a bot. Then you're using the exponential number of people to provide diminishing weight instead of increasing likelihood of malfeasance.

nottorp•6mo ago

But do we need an infinite and global web of trust?

How about restricting them to everyone-knows-everyone sized groups, of like a couple hundred people?

One can be a member of multiple groups so you're not actually limited. But the groups will be small enough to self regulate.

hinkley•6mo ago

What’s that going to do about all of the top search results and a good percentage of social media traffic being generated by SEO bots? Nothing.

You want to chat with a Dunbar number of people get yourself a private discord or slack channel.

nottorp•6mo ago

The Dunbar number of people could vouch for small web sites they come across. Or even for FB accounts if they choose to.

hinkley•6mo ago

I suspect a lot of people here are the ones in their circle who bring in a lot of the cool info that their friends missed out on. This still sounds like Slack.

nottorp•6mo ago

We're talking about webs of trust aren't we? Not about chat rooms.

I'm hypothesising that any such large scale structure will be perverted by commercial interests, while having multiple Dunbar sized such structures will have a chance to be useful.

sfink•6mo ago

Isn't the point of the web of trust that you can do something about the cousins/dudes out there? Once you discover that they sold out, even once, you sever them from the web. It doesn't matter if they took 20 years to succumb to the temptation, you can cut them off tomorrow. And that cuts off everyone they vouched for, recursively, unless there's a still-trusted vouch chain to someone.

At least, that's the way I've always imagined it working. Maybe I need to read up.

hubraumhugo•6mo ago

We all agree that AI crawlers are a big issue as they don't respect any established best practices, but we rarely talk about the path forward. Scraping has been around for as long as the internet, and it was mostly fine. There are many very legitimate use cases for browser automation and data extraction (I work in this space).

So what are potential solutions? We're somehow still stuck with CAPTCHAS, a 25 years old concept that wastes millions of human hours and billions in infra costs [0].

How can enable beneficial automation while protecting against abusive AI crawlers?

[0] https://arxiv.org/abs/2311.10911

udev4096•6mo ago

Blame the "AI" companies for that. I am glad the small web is pushing hard against these scrapers, with the rise of Anubis as a starting point

lelanthran•6mo ago

> Blame the "AI" companies for that. I am glad the small web is pushing hard towards these scrapers, with the rise of Anubis as a starting point

Did you mean "against"?

udev4096•6mo ago

Corrected, thanks

eastbound•6mo ago

But people don’t interact with your website anymore; they as an AI. So the AI crawler is a real user.

I say we ask Google Analytics to count an AI crawler as a real view. Let’s see who’s most popular.

CalRobert•6mo ago

I hate this but I suspect a login-only deanonymised web (made simple with chrome and WEI!) is the future. Firefox users can go to hell.

ArinaS•6mo ago

We won't.

spookie•6mo ago

I'm still surprised by people everyday, after all these years. This is one of those times. Crazy how anyone would ever want a single point of identifying everything you do.

CalRobert•6mo ago

I don't want this - It's the exact opposite of what I want.

CalRobert•6mo ago

To elaborate (if anyone sees this) I use Firefox on Linux. I don't LIKE this future! I just think it's where the web is headed.

CaptainFever•6mo ago

My pet peeve is that using the term "AI crawler" for this conflates things unnecessarily. There's some people who are angry at it due to anti-AI bias and not wishing to share information, while there are others who are more concerned about it due to the large amount of bandwidth and server overloading.

Not to mention that it's unknown if these are actually from AI companies, or from people pretending to be AI companies. You can set anything as your user agent.

It's more appropriate to mention the specific issue one haves about the crawlers, like "they request things too quickly" or "they're overloading my server". Then from there, it is easier to come to a solution than just "I hate AI". For example, one would realize that things like Anubis have existed forever, they are just called DDoS protection, specifically those using proof-of-work schemes (e.g. https://github.com/RuiSiang/PoW-Shield).

This also shifts the discussion away from something that adds to the discrimination against scraping in general, and more towards what is actually the issue: overloading servers, or in other words, DDoS.

johnnyanmac•6mo ago

It's become unbearable in the "AI era". So it's appropriate to blame AI for it, ib my eyes. Especially since so much defense is based aroind training LLMs.

It's just like how not all Ddoss's are actually hackers or bots. Sometimes a server just can't take the traffic of a large site flooding in. But the result is the same until something is investigated.

queenkjuul•6mo ago

It's not a coincidence that this wasn't a major problem until everybody and their dog started trying to build the next great LLM.

jeroenhd•6mo ago

The best solution I've seen is to hit everyone with a proof of work wall and whitelist the scrapers that are welcome (search engines and such).

Running SHA hash calculations for a second or so once every week is not bad for users, but with scrapers constantly starting new sessions they end up spending most of their time running useless Javascript, slowing the down significantly.

The most effective alternative to proof of work calculations seems to be remote attestation. The downside is that you're getting captchas if you're one of the 0.1% who disable secure boot and run Linux, but the vast majority of web users will live a captcha free life. This same mechanism could in theory also be used to authenticate welcome scrapers rather than relying on pure IP whitelists.

ognarb•5mo ago

The issue is that it would require normal user to also do the same, which is suboptimal from a privacy point of view.

0manrho•6mo ago

> So what are potential solutions?

It won't fully solve the problem, but with the problem relatively identified, you must then ask why people are engaging in this behavior. Answer: money, for the most part. Therefore, follow the money and identify the financial incentives driving this behavior. This leads you pretty quickly to a solution most people would reject out-of-hand: turn off the financial incentive that is driving the enshittification of the web. Which is to say, kill the ad-economy.

Or at least better regulate it while also levying punitive damages that are significant enough to both disuade bad-actors and encourage entities to view data-breaches (or the potential therein) and "leakage[0]" as something that should actually be effectively secured against. Afterall, there are some upsides to the ad-economy that, without it, would present some hard challenges (eg, how many people are willing to pay for search? what happens to the vibrant sphere of creators of all stripes that are incentivized by the ad-economy? etc).

Personally, I can't imagine this would actually happen. Pushback from monied interests aside, most people have given up on the idea of data-privacy or personal-ownership of their data, if they ever even cared in the first place. So, in the absence of willing to do do something about the incentive for this maligned behavior, we're left with few good options.

0: https://news.ycombinator.com/item?id=43716704 (see comments on all the various ways people's data is being leaked/leached/tracked/etc)

mjaseem•6mo ago

I wrote an article about a possible proof of personhood solution idea: https://mjaseem.github.io/tech/2025/04/12/proof-of-humanity.....

The broad idea is to use zero knowledge proofs with certification. It sort of flips the public key certification system and adds some privacy.

To get into place, the powers in charge need to sway.

marginalia_nu•6mo ago

Proof-of-work works in terms of preventing large-scale automation.

As for letting well behaved crawlers in, I've had an idea for something like DKIM for crawlers. Should be possible to set up a fairly cheap cryptographic solution that enables crawlers a persistent identity that can't be forged.

Basically put a header containing first a string including today's date, the crawler's IP, and a domain name, then a cryptographic signature of the string. The domain has a TXT record with a public key for verifying the identity. It's cheap because you really only need to verify the string it once on the server side, and the crawler only needs to regenerate it once per day.

With that in place, crawlers can crawl with their reputation at stake. The big problem with these rogue scrapers are that they're basically impossible to identify or block, which means they don't have any incentives to behave well.

lesostep•5mo ago

> Proof-of-work works in terms of preventing large-scale automation.

It wouldn't work to prevent the type of behavior shown in a title story

caelinsutch•6mo ago

CAPTCHAS are also quickly becoming irrelevant / not enough. Fingerprint based approaches seem to be the only realistic way forward in the cat / mouse game

y42•6mo ago

Let me get this straight: we want computers knowing everything, to solve current and future problems, but we don't want to give them access to our knowledge?

chairmansteve•6mo ago

Not sure we do.

3np•6mo ago

I don't want your computer to know everything about me, in fact.

y42•6mo ago

That’s not what I said. Let me rephrase it: Do you want computers to help us solve medical or scientific problems in order to improve human life?

drawfloat•6mo ago

Most people don’t want computers to know everything - ask the average person if they want more or less of their lives recorded and stored.

y42•6mo ago

> Most people don’t want computers to know everything.

That may well be true. But how many of those people are specifically against AI companies scraping the web? That’s not really an argument—it’s an assumption based on personal perception.

> Ask the average person if they want more or less of their lives recorded and stored.

What exactly is the "average person"? Also, I’ll admit my earlier claim was a bit exaggerated. But let’s be clear: this isn’t about recording personal data—it’s about collecting and structuring knowledge.

And beyond that: companies have been scraping the web for years. They still are. And they’re gathering far more personal data for online marketing, tracking, profiling—whatever the reason—and the so-called "average person" hasn’t raised much of a finger. People remain glued to platforms, willingly sharing their personal lives. And what do they get in return? Doomscrolling and five-second video clips.

lelanthran•6mo ago

> Let me get this straight: we want computers knowing everything, to solve current and future problems, but we don't want to give them access to our knowledge?

Who said that?

There's basically two extremes:

1. We want access to all of human knowledge, now and forever, in order to monetise it and make more money for us, and us alone.

and

2. We don't want our freely available knowledge sold back to us, with no credits to the original authors.

y42•6mo ago

1. What exactly is wrong with the first part of that point? I agree that the second part is inaccurate—right now, the money mostly flows in one direction. But as I mentioned earlier, we can use many of these tools for free.

2. You’re not paying just to have your own knowledge echoed back at you. You’re paying so that someone (or something) can read what you provide and, ideally, return improved knowledge or fresh insights. As I said above, you’re paying for the technology and its capabilities—not the knowledge itself. That’s how I see it.

lelanthran•6mo ago

I'm merely pointing out that there's two separate groups of people.

You appear to be under the impression that there is only one hypocritical group.

jeroenhd•6mo ago

I don't want computers to know everything. Most knowledge on the internet is false and entirely useless.

The companies selling us computers that supposedly know everything should pay for their database, or they should give away the knowledge they gained for free. Right now, the scraping and copying is free and the knowledge is behind a subscription to access a proprietary model that forms the basis of their business.

Humanity doesn't benefit, the snake oil salesmen do.

y42•6mo ago

That’s factually incorrect. You can use most of these products for free. I use ChatGPT, Perplexity, ClaudeAI, and Gemini every day without paying, and even just these free services have already improved various processes in my life.

I do agree with you on the point that we need to find better ways to compensate the people creating content—especially considering that parts of this "AI service," as we might call it, are subscription-based.

But in the long run, I’m quite sure that if everyone shared this opinion, it wouldn't move us forward technologically.

Also, a couple of other points:

    Google and others have been scraping the internet for years, and no one complained then.

    You're not paying the AI company for the knowledge itself—you're paying for the technology behind it, for the ability to access and use it effectively.

areyourllySorry•6mo ago

Intel and AMD standardise ChkTag to bring Memory Safety to x86

Building a message queue with only two UNIX signals

Claude Code on the web

Production RAG: what I learned from processing 5M+ documents

BERT is just a single text diffusion step

A laser pointer at 2B FPS [video]

My trick for getting consistent classification from LLMs

Alibaba Cloud says it cut Nvidia AI GPU use by 82% with new pooling system

Show HN: I created a cross-platform GUI for the JJ VCS (Git compatible)

Postman which I thought worked locally on my computer, is down

Today is when the Amazon brain drain sent AWS down the spout

x86-64 Playground – An online assembly editor and GDB-like debugger

Code from MIT's 1986 SICP video lectures

AWS Multiple Services Down in us-east-1

The scariest "user support" email I've ever received

TernFS – an exabyte scale, multi-region distributed filesystem

Art Must Act

How to stop Linux threads cleanly

Americans can't afford their cars any more and Wall Street is worried

Optical diffraction patterns made with a MOPA laser engraving machine [video]

Atomic-Scale Protein Filters

Old Is Gold: Optimizing Single-Threaded Applications with Exgen-Malloc

Space Elevator

The longest baseball game took 33 innings to win

Docker Systems Status: Full Service Disruption

Servo v0.0.1

J.P. Morgan's OpenAI loan is strange

DeepSeek OCR

iOS 26.1 lets users control Liquid Glass transparency

Show HN: Playwright Skill for Claude Code – Less context than playwright-MCP

Intel and AMD standardise ChkTag to bring Memory Safety to x86

Building a message queue with only two UNIX signals

Claude Code on the web

Production RAG: what I learned from processing 5M+ documents

BERT is just a single text diffusion step

A laser pointer at 2B FPS [video]

My trick for getting consistent classification from LLMs

Alibaba Cloud says it cut Nvidia AI GPU use by 82% with new pooling system

Show HN: I created a cross-platform GUI for the JJ VCS (Git compatible)

Postman which I thought worked locally on my computer, is down

Today is when the Amazon brain drain sent AWS down the spout

x86-64 Playground – An online assembly editor and GDB-like debugger

Code from MIT's 1986 SICP video lectures

AWS Multiple Services Down in us-east-1

The scariest "user support" email I've ever received

TernFS – an exabyte scale, multi-region distributed filesystem

Art Must Act

How to stop Linux threads cleanly

Americans can't afford their cars any more and Wall Street is worried

Optical diffraction patterns made with a MOPA laser engraving machine [video]

Atomic-Scale Protein Filters

Old Is Gold: Optimizing Single-Threaded Applications with Exgen-Malloc

Space Elevator

The longest baseball game took 33 innings to win

Docker Systems Status: Full Service Disruption

Servo v0.0.1

J.P. Morgan's OpenAI loan is strange

DeepSeek OCR

iOS 26.1 lets users control Liquid Glass transparency

Show HN: Playwright Skill for Claude Code – Less context than playwright-MCP

The Web Is Broken – Botnet Part 2

Comments