This appears to be a proof-of-work, like Anubis. Real captchas collect much more fingerprinting data to ensure that only users with the latest version of Chrome, the latest version of Windows, and an Nvidia graphics card, can use the site.
On topic though, how does this improve on hCaptcha?
Cloud vs self-hosted, click annoying things challenge vs automatic proof of work. Or are there other hCaptcha versions and I just never realized it?
I've been bewildered for some time as well, honestly, it took me a while to figure out the first I ran into.
And trying one now, fully knowing that I'd have to solve one, I was dumbfounded by the puzzle I've gotten, it took me a few seconds to understand it.
Cloudflare's ones are horrible and a plague (although they might have slightly improved recently), but I'm not certain I'd prefer hCaptchas over them.
No idea how I’d compare to others on my Network, that’d only be my wife and as a Linux user she’d probably get more than me with Windows ;)
Example: if the system identifies the user as a bot, it tries to give a less performant solution in terms of PoW.
I think somebody might have flagged your comment, but it is a real fact.
This is one of the reasons why people say cloudflare owns the majority of internet but I think I am okay with that since cloudflare is pretty chill. And they provide the best services but still it just shows that the internet isn't that decentralized.
But google captcha is literally tracking you IIRC, I would personally prefer hcaptcha if you want centralized solution or anubis if you want to self host (I Prefer anubis I guess)
Or sometimes everything that's not just Chromium[2].
[1] - https://www.theregister.com/2025/03/04/cloudflare_blocking_n...
[2] - https://www.techradar.com/pro/cloudflare-admits-security-too...
Downvotes. Comments with negative scores are shown with lower contrast. The more negative the score, the less contrast they get.
Why would someone renting dirt cheap botnet time care if the requests take a few seconds longer to your site?
Plus, the requests are still getting through after waiting a few seconds, so it does nothing for the website operator and just burns battery for legit users.
Usually if you're going to go through the trouble of integrating a captcha, you want to protect against targeted attacks like a forum spammer where you don't want to let the abusive requests through at all, not just let it through after 5000ms.
Even if the bot owner doesn't watch (or care) about about their crawling metrics, at least the botnot is not DDoSing the site in the meantime.
This is essentially a client-side tarpit, which are actually pretty effective against all forms of bot traffic while not impacting legitimate users very much if at all.
This is something you throw everyone through. both your abusive clients (running on stolen or datacenter hardware) and your real clients (running on battery-powered laptops and phones). More like a tar-checkpoint.
So the crazy decentralized mystery botnet(s) that are affecting many of us -- don't seem to be that worried about cost. They are making millions of duplicate requests for duplicate useless content, it's pretty wild.
On the other hand, they ALSO dont' seem to be running user-agents that execute javascript.
This is in the findings of a group of some of my colleagues at peer non-profits that have been sharing notes to try to understand what's going on.
So the fact that they don't run JS at present means that PoW would stop them -- but so would something much simpler and cheaper relying on JS.
If this becomes popular, could they afford to run JS and to calcualte the PoW?
It's really unclear. The behavior of these things does not make sense to me enough to have much of a theory about what their cost/benefits or budgets are, it's all a mystery to me.
Definitely hoping someone manages to figure out who's really behind this and why at some point. (i am definitely not assuming it's a single entity either).
Basically you need session-token generators which usually are automated headless browsers.
Another not-exactly-valid point is you don't need a botnet. You can scrape at scale with 1 machine using proxies. Proxies are dirt cheap.
So basically you generate a session for a proxy IP and scrape as long as the token is valid. No botnets, no magic, nada. Just business.
i might at any rate set my PoW to be relatively cheap, which would do for anyone not executing JS.
There are two problems some website hosters encounter:
A) How do I ensure no one DDOS (real or inadvertently) me?
B) How can I ensure this client is actually a human, not a robot?
Things like ReCaptcha aimed to solve B, not A. But the submitted solution seems to be more for A, as calculating a PoW can be (probably must be actually) calculated by a machine, not a human. While ReCaptcha is supposed to be the opposite, could only be solved by a human.
AI bots can't solve proof-of-work challenges because browsers they use for scraping don't support features needed to solve them. This is highlighted by existence of other proof-of-work solutions designed to specifically filter out AI bots, like go-away[1] or Anubis[2].
And yes, they work - once GNOME deployed one of these proof-of-work challenges on their gitlab instance, traffic on it fell by 97%[3].
[1] - https://git.gammaspectra.live/git/go-away
[2] - https://github.com/TecharoHQ/anubis
[3] - https://thelibre.news/foss-infrastructure-is-under-attack-by...: "According to Bart Piotrowski, in around two hours and a half they received 81k total requests, and out of those only 3% passed Anubi's proof of work, hinting at 97% of the traffic being bots."
At least sometimes. I do not know about AI scraping but there are plenty of scraping solutions that do run JS.
It also puts of some genuine users like me who prefer to keep JS off.
The 97% is only accurate if you assume a zero false positive rate.
Non-javascript challenges are also available[1].
> "The 97% is only accurate if you assume a zero false positive rate."
GNOME's gitlab instance is not something people visit daily like Wikipedia, so it's a negligible amount of false positives.
[1] - https://git.gammaspectra.live/git/go-away/wiki/Challenges#no...
Did not know that. Good news
> NOME's gitlab instance is not something people visit daily like Wikipedia, so it's a negligible amount of false positives.
As an absolute number, yes, but as a proportion?
Huh, they definitely can?
go-away and Anubis reduces the load on your servers as bot operators cannot just scrape N pages per second without any drawbacks. Instead it gets really expensive to make 1000s of requests, as they're all really slow.
But for a user who uses their own AI agent, that browses the web, things like anubis and go-away aren't meant to (nor does it) stop them from accessing the websites at all, it'll just be a tiny bit slower.
Those tools are meant to stop site-wide scraping, not individual automatic user-agents.
Well, maybe. As far as I can see, the overt ones are using pretty reasonable rate limits, even though they're scraping in useless ways (every combination of git hash and file path on gitea). Rather, it seems like he anonymous ones are the problem - and since they're anonymous, we have zero reason to believe they're AI companies. Some of them are running on Huawei Cloud. I doubt OpenAI is using Huawei Cloud.
By this point, it’s obvious that that has failed, and even that no general solution is possible any more.
ALTCHA… telling Computers and Humans Apart? No, this is proof of work, meaning it’s just about making things expensive—abuse control, not actually distinguishing between computers and humans.
In fact, in https://altcha.org/captcha/ one of the headings is Inclusive to Robots! This is so far the opposite of traditional CAPTCHA, on the technical side, that it’s mildly hilarious. (Socially, they largely amount to the same thing—people never did actually care about computers, just abusive bots.)
Then the question is: what is the proof of work mechanism? How robust are things going to be, and can you ensure attacking will remain expensive, without burdening users too much?
https://altcha.org/docs/proof-of-work/ indicates it’s SHA hashing, not something like scrypt. Uh oh. The best specialised hardware is several million times as good as good laptops¹, let alone cheap phones. If this were to become popular, bots would switch to such hardware, probably making the cost of attacking practically negligible. https://altcha.org/docs/complexity/ shows they’ve thought about these things, but I feel that although it will work for a while, it’s ultimately a doomed game. And in the mean time, you can normally go waaaay simpler and less intrusive: most bots are extremely dumb.
Is “captcha” heading in the direction of meaning “bad rate limiting”?
Because really that’s what this stuff is: rate limiting that trusts that clients don’t have lots of compute power conveniently available, but will get vaporised by powerful and intentional adversaries.
—⁂—
¹ On the https://altcha.org/docs/complexity/ test, a comparatively ideal browser on my 5800HS laptop might reach 500,000 SHA-256 hashes per second at a cost of at least 25W. (Chromium gets half this with ~50% CPU usage; Firefox one tenth, altogether failing to load the cores for some reason.) The most energy-efficient commercial Bitcoin miners seem to be doing around 80 billion of these hashes per watt-second. That’s four million times as good. You cannot bridge such a divide.
In fact I used to fake user agent all the time because Microsoft 365 is so retarded. With the Firefox/Linux user agent a lot of features don't work. When it pretends to be MS Edge it works fine. Clearly trying to force people to use the 'invented here' browser :(
But as I was getting captcha's I moved to using it only for the MS365 sites and nowhere else. It seems to have reduced the captcha's somewhat, especially the ones that never end (keep looping). But I still get a ton of "Your browser is suspicious, here's an extra check" nonsense from Cloudflare in particular.
Lots of big businesses use recaptcha. Quite often unnecessarily. If I need to login with 2FA touse a service does it really need recaptcha?
Similarly, cloudflare sends you emails telling you how many bots and attacks it has stopped - but you do not know how many false positives there were.
I would guess that simple rate limiting would do the trick for the rest
As far as I can tell, most startups resolve their technical debt by failing, and the majority of the rest resolve their debt by being acquired by a company which replaces the original service entirely in 1-3 years because it's too hard to integrate as-is.
At least with captchas, it's somewhat understandable with the arms-race aspect. The third party does the work of engaging in the arms race, so you don't have to, but the tradeoff is what you describe.
Maybe it's only used on individual form submit (like the classic captcha use-case), and not on a page load, and it does have to be recalculated on every form submit?
To prevent the vulnerability of “replay attacks,” where a client resubmits the same solution multiple times, the server should implement measures that invalidate previously solved challenges.
The server should maintain a registry of solved challenges and reject any submissions that attempt to reuse a challenge that has already been successfully solved.
This doesn't seem very scaleable? Or am I missing something?
This is only trying to tell human browsers from bot browsers apart. Not even that, it seems all it does is slow all browsers down equally.
Like whether there's a checkbox you have to click, and whether it spins for a while when you click it. That's a CAPTCHA now. And working is when your butt is in the chair. And investing is when you give someone money and they promise to give more back tater. And food is things that fit in your mouth and don't kill you. And free speech is when you get turned away at the border for disliking the president on social media. And top-of-the-line CPUs are ones that die within 24 months. Meanwhile the totalitarian dictatorship across the pond actually does all these things better somehow (except the politics). https://en.wikipedia.org/wiki/HyperNormalisation#Etymology
rahimnathwani•7mo ago
unixfox•7mo ago