I realize Anubis was probably never tested on a true single-core machine. They are actually somewhat difficult to find these days outside of microcontrollers.
Also, they still might not (but probably learned). In this article they imply that each type of CPU core (what they call a "tier" in the article) will still be a power of two, and one just happened to be 2^0. I'm not sure they were around when the AMD Athlon II X3 was hot.
>>> Today I learned this was possible. This was a total "today I learned" moment. I didn't actually think that hardware vendors shipped processors with an odd number of cores, however if you look at the core geometry of the Pixel 8 Pro, it has three tiers of processor cores. I guess every assumption that developers have about CPU design is probably wrong.
Another joke from the same era: Having a 2 core processor means that you can now e.g. watch a film at the same time. At the same time with what? At the same time with running Windows Vista!
2^0 = 1
So the logic might make sense in people's heads if they never encounter 6 or 12 core CPUs that are common these days.
I have Chrome on mobile configured as such that JS and cookies are disabled by default, and then I enable them per site based on my judgement. You might be surprised to learn that normally, this actually works fine, and sites are usually better for it. They stop nagging, and load faster. This makes some sense in retrospect, as this is what allows search engine crawlers to do their thing and get that SEO score going.
Anubis (and Cloudflare for that matter) force me to temporarily enable JS and cookies at least once however anyways, completely defeating the purpose of my paranoid settings. I basically never bother to, but I do admit it is annoying. It's kind of up there with sites that don't have any content by default, only with JS on (high profile example: AWS docs). At least Cloudflare only spoils the fun every now and then. With Anubis, it's always.
It's definitely my fault, but at the same time, I don't feel this is right. Simple static pages now require allowing arbitrary code execution and statefulness. (Although I do recognize that SVGs and fonts also kind of do so anyhow, much to my further annoyance).
Making you pay time, power, bandwidth, or money to access content does not significantly impede your browsing, so long as the cost is appropriately small. For the user above reporting thirty seconds of maxcpu, that’s excessive for a median normal person (but us hackers are not that).
If giving your unique burned-in crypto-attested device ID is acceptable, there’s an entire standard for that, and when your device is found to misbehave, your device can be banned. Nintendo, Sony, Xbox call this a “console ban”; it’s quite effective because it’s stunningly expensive.
If submitting proof of citizenship through whatever anonymous-attestation protocol is palatable is okay, the Anubis could simply add the digital ID web standard and let users skip the proof of work in exchange for affirming that they have a valid digital ID. But this only works if the identity can be banned, or else AI crawlers will just send a valid anonymized digital ID header.
This problem repeats in every suggested outcome: either you make it more difficult for users to access a site, or you require users to waste energy to access a site, or you require identifiable information signed by a dependable third-party authority to be presented such that a ban is possible based on it. IP addresses don’t satisfy this; Apple IDs, immutable HSM-protected device identifiers, and digital passports do satisfy this.
If you have a solution that only presents barriers to excessive use and allows abusive traffic to be revoked without depending on IP address, browser fingerprint, or paid/state credentials, then you can make billions of dollars in twelve months.
Ideas welcome! This has been a problem since bots started scraping RSS feeds and republishing them as SEO blogs, and we still don’t have a solution besides Cloudflare and/or CPU-burning interstitials.
I'm not sure what generation it is, but I bought it around a decade ago I think.
Javascripters, perhaps. Those who work on schedulers, or kernels in general would find this completely normal
ranger_danger•1h ago
Why?
What would the alternative have been?
tux3•1h ago
The first effect is great, because it's a lot more annoying to bring up a full browser environment in your scraper than just run a curl command.
But the actual proof of work only takes about 10ms on a server in native code, while it can take multiple seconds on a low-end phone. Given the companies in questions are building entire data centers to house all their GPUs, an extra 10ms per web-page is not a problem for them. They're going to spend orders of magnitude more compute actually training on the content they scraped, than solving the challenge.
It's mostly the inconvenience of adapting to Anubis's JS requirements that held them back for a while, but the PoW difficulty mostly slowed down real users.
alright2565•1h ago
MBCook•1h ago
https://github.com/TecharoHQ/anubis/pull/1038
Could someone explain how this would help stop scrapers? If you’re just running the page JS wouldn’t this run too and let you through?
fluoridation•1h ago
MBCook•40m ago
fluoridation•30m ago
ranger_danger•1h ago
> how this would help stop scrapers
I think anubis bases its purpose on some flawed assumptions:
- that most scrapers aren't headless browsers
- that they don't have access to millions of different IPs across the world from big/shady proxy companies
- that this can help with a real network-level DDoS
- that scrapers will give up if the requests become 'too expensive'
- that they aren't contributing to warming the planet
I'm sure there does exist some older bots that are not smart and don't use headless browsers, but especially with newer tech/AI crawlers/etc., I don't think this is a realistic majority assumption anymore.
zetanor•1h ago
jsnell•1h ago
In an adversarial engineering domain neither the problems or solutions are static. If by some miracle you have a perfect solution at one point in time, the adversaries will quickly adapt, and your solution stops being perfect.
So you’ll mostly be playing the game in this shifting gray area of maybe legit, maybe abusive cases. Since you can’t perfectly classify them (if you could, they wouldn’t be in the gray area), the options are basically to either block all of them, allow all of them, or issue them a challenge that the user must pass to be allowed. The first two options tend to be unacceptable in the gray area, so issuing a challenge that the client must pass is usually the preferred option.
A good counter-abuse challenge is something that has at least one of the following properties:
1. It costs more to pass than the economic value that the adversary can extract from the service, but not so much that the legitimate users won’t be willing to pay it.
2. It proves control of a scarce resource without necessarily having to spend that resource, but at least in such a way that the same scarce resource can’t be used to pass unlimited challenges.
3. It produces additional signals that can be used to meaningfully improve the precision/recall tradeoff.
And proof of work does none of those. The last two by construction, since compute is about the most fungible resource in the world. The last doesn't work since it's impossible to balance the difficulty factor such that it imposes a cost the attacker would notice but would be acceptable to the defender.
If you add 10s to the latency for your worst-case real users (already too long), it'll cost about $0.01/1k solves. That's not a deterrent to any kind of abuse.
So proof of work just is a really bad fit for this specific use case. The only advantage is that it is easy to implement, but that's a very short term benefit.