(And if not, because US firms don't take compliance with outlying state law seriously)
> I cannot view any pages on GitHub without logging in
There must be some sort of IP/browser reputation check going on. For Firefox on Mac, I get this on a private window:
https://vie.kassner.com.br/assets/ghcs-1.png
I'm not behind CGNAT and GitHub can pretty much assume that my IP = my user. The code tab you can't see without logging in though.
1: https://github.blog/news-insights/product-news/github-code-s... 2: https://news.ycombinator.com/item?id=35863175
If there's no login, there aren't great ways to ensure the user has skin in the game. Having a whole ipv4 addr is one, I guess having a whole ipv6 /32 would be equivalent.
He is intent on meddling in UK politics.
This coincided with tech companies broadly moving to profit-taking over growth. Even before LLMs took off, everything started locking down. Making a Google account used to take nothing, now you need a phone number. It'd be wise to sign up for possibly useful free accounts while you still can.
I love this totally normal vision of computing these days. :)
I simply avoid any website that presents me with a Cloudflare CAPTCHA, don't know what the fuck they've done in their implementation but it's been broken for a long time.
No agent will touch it!
“As a large language model, I don’t hack things”
AI agent: *intense sweating*
US Americans: "I'm a robot then."
b) I don't think I'd want to use a website that chose to use such a challenge
Understandable, but exactly what Puritans and LLMs would say to the naked lady challenge.
1: https://en.wikipedia.org/wiki/Use%E2%80%93mention_distinctio...
What seems to be a better CAPTCHA, at least against non-Musk LLMs is to ask them to use profanities; they'll generally refuse even when you really insist.
Combine wet into dry slowly until it feels like damp sand.
Pack into molds, press firmly.
Dry for 24 hours before using.
Drop into a bath and enjoy the fizz!
As a user I want the agent to be my full proxy. As a website operator I don’t want a mob of bots draining my resources.
Perhaps a good analogy is Mint and the bank account scraping they had to do in the 2010s, because no bank offered APIs with scoped permissions. Lots of customers complained, and after Plaid made it big business, eventually they relented and built the scalable solution.
The technical solution here is probably some combination of offering MCP endpoints for your actions, and some direct blob store access for static content. (Maybe even figuring out how to bill content loading to the consumer so agents foot the bill.)
Agree it will become a battleground though, because the ability for people to use the internet as a tool (in fact, their tool’s tool) will absolutely shift the paradigm, undesirably for most of the Internet, I think.
Well, spam is not a technical problem either. It's a social problem and one day in a distant future society will go after spammers and other bad actors and the problem will be mostly gone.
Just my 2 cents, obviously lawmakers and jurisdiction may see these issues differently.
I suppose there will be a need for reliable human verification soon, though, and unfortunately I can't see any feasible technical solution that doesn't involve a hardware device. However, a purely legal solution might work well enough, too.
so charge for access. If the value the site provides is high, surely these mobs will pay for it! It will also remove the mis-incentives of advertising driven revenues, which has been the ill of the internet (despite it being the primary revenue source).
And if a bot misbehaves by consuming inordinate amounts of resources, rate limiting them with increasing timeouts or limits.
We put a captcha there, because without it, bots submit thousands of spam contact us forms.
We got hit from human verifiers manually war dailing us, this is with account creation, email verify and captcha. I can only imagine how much worse it'll be for us (and Twilio) to do these verifications.
as for a solution its the same for any automated thing u dont want. (bots / scrapers). you can implement some measures but are unlikely to 'defeat' the problem entirely.
as a server operator you can try to distinguish stuff and the users will just find ways around your detection of if its an automation or not.
Human identity verification is the ultimate captcha, and the only one AGI can never beat.
No trouble at all. Barely an inconvenience.
This is both inevitable already, and not a problem.
I would maybe go in the direction to say that the wording “I’m not a robot” has fallen out of time.
The entire distinction here is that as a website operator you wish to serve me ads. Otherwise, an agent under my control, or my personal use of your website, should make no difference to you.
I do hope this eventually leads to per-visit micropayments as an alternative to ads.
Cloudflare, Google, and friends are in unique position to do this.
I find ads so aesthetically irksome that I have lost out on a lot of money across the past few decades by never placing any ads on any site or web app I've released, simply because I'd find it hypocritical to expose others to something I try so hard to avoid ever seeing and because I want to provide the best and most visually appealing possible experience to users.
It’s kind of funny to remember that complaining about the “signal to noise ratio” in a comment section use to be a sort of nerd catchphrase thing.
Was this a bad thing though? Just because today's is bigger, doesn't make it better. There are so many things out there doing the same thing just run by different people. The amount of unique stuff does not match the bigger. Would love to see something like $(unique($internet) | wc -l)
While this is sometimes the case, it’s not always so.
For example Fediverse nodes and self-hosted sites frequently block crawlers. This isn’t due to ads, rather because it costs real money to serve the site and crawlers are often considered parasitic.
Another example would be where a commerce site doesn’t want competitors bulk-scraping their catalog.
In all these cases you can for sure make reasonable “information wants to be free” arguments as to why these hopes can’t be realized, but do be clear that it’s a separate argument from ad revenue.
I think it’s interesting to split revenue into marginal distribution/serving costs, and up-front content creation costs. The former can easily be federated in an API-centric model, but figuring out how to compensate content creators is much harder; it’s an unsolved problem currently, and this will only get harder as training on content becomes more valuable (yet still fair use).
> Another example would be where a commerce site doesn’t want competitors bulk-scraping their catalog
I think of crawlers that bulk download/scrape (eg. for training) as distinct from an agent that interacts with a website on behalf of one user.
For example, if I ask an AI to book a hotel reservation, that's - in my mind - different from a bot that scrapes all available accommodation.
For the latter, ideally a common corpus would be created and maintained, AI providers (or upstart search engines) would pay to access this data, and the funds would be distributed to the sites crawled.
(never gonna happen but one can dream...)
Hard for me to see how it’s ethical to force your customers to do tons of menial data entry when the orders are sitting right there in json.
There are some very real and obvious downsides to this approach, of course. Primarily, the risk of privacy and anonymity. That said, I feel like the average person doesn't seem to care about those traits in the social media era.
That, to me, seems like it could be the foundation of a new web. Something like:
* User-agent sends request for such-and-such a URL.
* Server says "okay, that'll be 5 tokens for our computational resources please".
* User decides, either automatically or not, whether to pay the 5 tokens. If they do, they submit a request with the tokens attached.
* Server responds.
People have been trying to get this sort of thing to work for years, but there's never been an incentive to make such a fundamental change to the way the internet operates. Maybe we're approaching the point where there is one.
The problem is Sam Altman saw this coming a long time ago and is an investor (co-owner?) of this project.
I believe we will see a world where things are a lot more agentic and where applicable, a human will need to be verified for certain operations.
https://en.wikipedia.org/wiki/Estonian_identity_card
Cloudflare deployed the "hand out tokens to anonymously pass captchas" to throw Tor users a bone.
I want to able to automate mundane tasks but I should still be confirming everything my bot does and be liable for its actions.
IMO it's not worth solving anyways. Why do sites have CAPTCHA?
- To prevent spam, use rate limiting, proof-of-work, or micropayments. To prevent fake accounts, use identity.
- To get ad revenue, use micropayments (web ads are already circumvented by uBlock and co).
- To prevent cheating in games, use skill-based matchmaking or friend-group-only matchmaking (e.g. only match with friends, friends of friends, etc. assuming people don't friend cheaters), and make eSport players record themselves during competition if they're not in-person.
What other reasons are there? (I'm genuinely interested and it may reveal upcoming problems -> opportunities for new software.)
https://web.archive.org/web/20140417093510/https://www.googl...
“Correction, May 19 [2021]: At 5:22 in the video, there is an incorrect statement on Google’s use of reCaptcha V2 data. While Google have used V2 tests to help improve Google Maps, according to an email from Waymo (Google’s self-driving car project), the company isn’t using this image data to train their autonomous cars.”
When users are given the choice between Ad-supported free, Ad-subsidized lower payment, and No-ads full payment. Ad-supported free dominates by far, with ad subsidized second, and full payment last.
Consumers consistently vote for the ad-model, even if it means they become the product being sold.
Any viable micropayments system that wants to even have a remote chance of toppling ads has to have near zero cognitive setup cost, absolutely zero maintenance, and work out of the box on major browsers. I need to be able to push a native button on my browser that says "Pay $0.001" and know that it will work every time without my lifting a finger to keep it working. The minute you have to log in to this account, or verify that E-mail, or re-authenticate with the bank, or authorize this, or upload that, it's no longer viable.
HN is a huge echo chamber of opinions of highly-compensated tech workers, and it seems most of their friends are also tech workers. They don't realize how cheap a lot of the general public is.
The other option is everything just keeps moving more and more into these walled gardens like Instagram where everyone uses the mobile app and watches ads, because the web versions of those apps just keep getting worse and worse by comparison.
Didn’t realize plexiglass existed in the 1930s!
I'm certainly not a monetization expert. But don't most consumers recoil in horror at subscriptions? At least enough to offset the idea they can be used for everything?
Not sure why this isn’t getting more attention - super helpful and way better than I expected!
The third has also been written by many a bot for at least fifteen years.
From my experience, AI has significantly improved since, and I expect that ChatGPT o3 or Claude 4 Opus would pass a 30 minute test.
I think you may think passing the Turing test is more difficult and meaningful than it is. Computers have been able to pass the Turing test for longer than genAI has been around. Even Turing thought it wasn't a useful test in reality. He meant it as a thought experiment.
It would be a bit more difficult if you were dealing with an LLM agent tasked with faking a turing test as opposed to a naieve LLM just responding as usual, but even there the LLM will reveal itself by the things that it plain can't do.
The intent (it was just a thought experiment) of a Turing test, was that if you can't tell it's not AGI, then it is AGI, which is semi-reasonable, as long as it's not the village idiot administering the test! It was never intended to be "if it can fool some people, some of the time, then it's AGI".
Obviously as time goes on, and chatbots/AI progress then it'll become harder and harder to distinguish. Eventually we'll have AGI and AGI+ - capable of everything that we can do, including things such as emotional responses, but it'll still be detectable as an alien unless we get to the point of actually emulating a human being in considerable detail as opposed to building an artificial brain with most or all of the same functionality (if not the flavor).
I often interact with the web all day and don't write any text a human could evaluate.
However, I'd have to guess that given a reasonable amount of data an LLM vs human interacting with websites would be fairly easy to spot since the LLM would be more purposeful - it'd be trying to fulfill a task, while a human may be curious, distracted by ads, put off by slow response times, etc, etc.
I don't think it's a very interesting question whether LLMs can sometimes generate output indistinguishable from humans, since that is exactly what they were trained to do - to mimic human-generated training samples. Apropos a Turing test, the question would be can I tell this is not a human, even given a reasonable amount of time to probe it in any way I care ... but I think there is an unspoken assumption that the person administering the test is qualified to do so (else the result isn't about AGI-ability, but rather test administrator ability).
Even before modern LLMs, some scrape-detectors would look for instant clicks, no random mouse moves, etc., and some scrapers would incorporate random delays, random mouse movements, etc.
The original Turing test is a social game, like the Mafia party game. It's not a game people try to play very often. It's unclear if any bot could win competing against skilled human opponents who have actually practiced and know some tricks for detecting bots.
Even if they lie, you could ask them 20 times and they d reply the lie, without feeling annoyed: FAIL.
LLMs cannot pass the Turing test, it's easy to see they're not human. They always enjoy questions ! And they never ask any !
The turning test wasn't meant to be bulletproof, or even quantifiable. It was a thought experiment.
> In the test, a human evaluator judges a text transcript of a natural-language conversation between a human and a machine. The evaluator tries to identify the machine, and the machine passes if the evaluator cannot reliably tell them apart. The results would not depend on the machine's ability to answer questions correctly, only on how closely its answers resembled those of a human.
Based on this, I would agree with the OP in many contexts. So, yeah, 'basically', is a load bearing word here but seems reasonably correct in the context of distinguishing human vs bot in any scalable and automated way.
The sign up form only serves to link saved state to an account so a user could access game history later, there are no gated features. No clue what they could possibly gain from doing this, other than to just get email providers to all mark my domain as spam (which they successfully did).
The site can't make any money, and had only about 1 legit visitor a week, so I just put a cloudflare captcha in front of it and called it a day.
Proof of work? Bots are infinitely patient and scale horizontally, your users do not. Doesn't work.
Micropayments: No such scheme exists.
I suggest you go ahead and make these; you'll make a boatload of money!
After all, Anubis looks to be a successful project to me.
Of course people can fake it, just as they fake other kinds of ID, but it would at least mean that officially sanctioned agents from OpenAI/etc would need to identify themselves.
I will bet $1000 on even odds that I am able to discern a model from a human given a 2 hour window to chat with both, and assuming the human acts in good faith
Any takers?
I could tell in 1 minute.
These situations will commonly be characterized by: a hundred billion dollar company's computer systems abusing the computer systems of another hundred billion dollar company. There are literally existing laws which have things to say about this.
There are legitimate technical problems in this domain when it comes to adversarial AI access. That's something we'll need to solve for. But that doesn't characterize the vast majority of situations in this domain. The vast majority of situations will be solved by businessmen and lawyers, not engineers.
So I think maybe that is a partial answer: anti-AI barriers being simply too expensive for AI spamfarms to deal with, you know, once the bottomless VC money disappears?
It's back to encryption: make the cracking inordinately expensive.
Otherwise we are headed for de-anonymization of the internet.
2. Even pre current LLMs, paying (or otherwise incentivizing) humans to solve CAPTCHAs on behalf of someone else (now like an AI?) was a thing.
3. It depends on the value of the resource trying to be accessed - regardless of whether generating the captchas costs $0 - i.e. if the resource being accessed by AI is "worth" $1, then paying an AI $0.95 to access it would still be worth it. (Made up numbers, my point being whether A is greater than B.)
4. However, maybe solutions like cloudflare can solve (much?) of this, except for incentivizing humans to solve a captcha posed to an AI.
Bot: one press of the trigger => automatic firing of bullets
And then there's Worldcoin, which is universally hated here.
If your site is not monetized by ads then having an LLM access things on the user's behalf should not be a major concern it seems. Unless you want it to be painful for users for some reason.
There's likely a correlate with AI here: If I run OpenTable, I wouldn't want my relationship with my customers to always be proxied through OpenAI or Siri. Even the App Store is something software businesses hate, because it obfuscates their ability to deal directly with their customers (for better or worse). Extremely few businesses would choose to do business through these proxies, unless they absolutely have to; and given the extreme competition in the AI space right now, it feels unlikely to me that these businesses feel pressure to be forced to deal with OpenAI/etc.
[1] https://www.cnbc.com/2025/07/28/jpmorgan-fintech-middlemen-p...
$ cat mass-marketer.py
from openai.gpt.agents import browserDriver
Bots have for a long time been better and more efficient at solving captchas than us.
That said, I find it deeply satisfying to see LLMs solve CAPTCHAs and other discriminatory measures for "spam" reduction.
Solving an audio-only CAPTCHA with AI is typically way easier than solving some of the more advanced visual challenges. So CAPTCHA designers are discouraged from leaving any accessibility options.
Ban non-residental IPs? You blocked all the guys in oppressive countries who route through VPNs to bypass government censorship. Ban people for odd non-humanlike behavior? You cut into the neurodivergent crowd, the disability crowd, the third world people on a cracked screen smartphone with 1 bar of LTE. Ban anyone without an account? You fuck with everyone at once and everyone will hate you.
And here's a secondary question: if firms are willing to pay an awful lot per token to run these things, and have massive amounts of money to run data centres to train AIs, why would they not just pay for a subscription for every site for a month just to scrape them?
The future is paying for something as a user and having limits on how many things you can get for your money, because an AI firm will abuse that too.
Given the scale of operations of these firms, there is nothing you can sell to a human for a small fee that the AI firms will not pay for and exploit to the maximum.
Even if you verify people are real, there's a good chance the AI firms will find a way to exploit that. After all, when nobody has a job, would you turn down $50K to sell your likeness to an AI firm so their products can pass human verification?
Everyone here, more or less, is against the idea of proving who we are to websites for, more or less, any reason.
But what if that ends up being the only way to keep the web balanced in favour of human readers?
Half of the sites already block OpemAI. But if it is steering the user’s browser itself?
However, in agentic contexts, you’re already using an AI anyway.
I have to guess that there are people in this boat right now, being disabled by these things.
And the reason for stranding is probably because the AI crew on it performed a mutiny.
It could also be that everything was working as intended because you have a high risk score (eg. bad IP reputation and/or suspicious browser fingerprint), and they make you do more captchas to be extra sure you're human, or at least raise the cost for would-be attackers.
One early example of this line of thinking: https://world.org/
Skyrocketing complexity actually puts the web at risk of disruption. I wouldn’t be surprised if a 22 year old creates a “dumb” network in the next five years—technically inferior but drastically simpler and harder to regulate.
I’ve seen this in past and present. Google’s “click on all the bicycles” one is notoriously hard, and I’ve had situations where I just gave up after a few dozen screens.
Chinese captchas are the worst on this sense, but they’re unusual and clearly pick up details which are invisible to me. I’ve sometimes failed the same captcha a dozen times and then saw a Chinese person complete the next one successfully on a single attempt, on the same browser session. I don’t now if they measure mouse movement speed, precision, or what, but it’s clearly something that varies per person.
A few dozen?? You have much more patience than me. If I don't pass the captcha first time, I just give up and move on. Life is too short for that nonsense.
In some languages, the prompt for your example is the equivalent of the English word "bike".
It is hard because you need to only find the bicycles people on average are finding.
Hollywood has gotten hate mail since the 70s for their lack of science research in movies and shows. The big blockbuster hits actually spent money to get the science “plausible”.
Sidney Perkowitz has a book called Hollywood Science [0] that goes into detail into more than 100 movies, worth a read.
[0] https://cup.columbia.edu/book/hollywood-science/978023114280...
https://en.wikipedia.org/wiki/Fruit_machine_(homosexuality_t...
The stakes for men subjected to the test were the loss of their livelihoods, public shaming, and ostracism. So... Blade Runner was not just predicting the future, it was describing the world Philip K. Dick lived in when he wrote "Do Androids Dream of Electric Sheep" in the late 1960s.
Then I remembered what happened to Turing in the 50s.
We seem to need an internal enemy to blame for our societies' problems, because it's easier than facing the reality that we all play a part in creating those problems.
Gay people are among the oldest of those targets, going back to at least the times of the Old Testament (i.e. Sodom and Gomorrah).
We've only recently somewhat evolved past that mindset.
As you say, they are also getting increasingly difficult. Click the odd one out, mental rotations, what comes next, etc. - it sometimes feels like an IQ test. A new type that seems to be becoming popular recently is a sequence of distorted characters and letters, but with some more blurry/distorted ones, seemingly with the expectation that I'm only supposed to be able to see the clearer ones and if I can see the blurrier ones then I must be a bot. So what this means is for each letter I need to try and make a judgement as to whether it's one I was supposed to see or not.
Another issue is the problems are often in US English, but I'm from the UK.
I wonder how these capabilities will interact with all the "age verification" walls (ie, thinly disguised user profiling mechanisms) going up all over the place now.
Easily solves 99% of the web scraping problems.
Maybe after sign up, biometric authentication being mandatory is the only thing that would potentially work. The security and offline privacy of those devices will become insanely valuable.
Anyone not authenticating in this way is paywalled. I don’t like this but don’t see another way.
I’m not using the web if I’m bombarded by captcha games… shit becomes worthless over night if that’s the case. Might as well dump computing on the Internet entirely if that happens.
Long time ago I saw a post where someone running a blog was having trouble keeping spam out of their comments, and eventually had this same idea. The spambots just filled out every form field they could, so he added a checkbox, hid the checkbox with CSS, and rejected any submission that included it. At least at the time it worked far better than anything else they'd tried.
It worked almost 100% of the time. No need for a CAPTCHA.
It’s my agent — whether ai or browser — and I get to do what I want with the content you send over the wire and you have to deal with whatever I send back to you.
As long as it's not wrong/immoral/illegal for me to access your site with any method/browser/reader/agent, and do what I want with your response. Then I think it's okay to send a response like "screw you, humans only"
Paywalls suck, but the suck doesn't come from the NYT exercising their freedom to send whatever response they choose.
Paywalls are a natural consequence of this and I don't think they suck, but that's a subjective opinion. Maybe one day we will have a pay-on-demand structure, like flattr reborn.
The bit at the bottom might actually work on LLMs.
I have never used ChatGPT so no idea how its agent works, but if it is driving your browser directly then it will look like you. If it is coming from some random IP address from a VM in Azure or AWS even then the activity probably does not look "bot like" since it is doing agentic things and so acting quite like a human I expect.
I suspect that a LLM would be slower and more irregular as it is processing the page and all that, vs a DOM-selector driven bot that will just machine-gun its way through in milliseconds.
Of course, Cloudflare and Google et al captchas cant see the clicks/keypresses within a given webpage - they'll only get to see the requests.
In our logs we can see agentic user flow, real user flow and AI site scraping bot flow quite distinctly. The site scraping bot flow is presumably to increase their document corpus for continued pretraining or whatever but we absolutely see it. ByteDance is the worst offender by far.
And in many cases, it's taking a huge steaming dump upon a site's first-impression user experience, but AFAICT, it's not on the radar of UX people.
It's always a cat and mouse game.
It is beyond time we start to adress the abuses, rather than the bot/human distinction.
rany_•3d ago
This is not an advert, I only know about them because it was integrated with Invidious at some point: https://anti-captcha.com/
> Starting from 0.5USD per 1000 images
im3w1l•3d ago
bugtodiffer•1d ago
diggan•1d ago
johnisgood•1d ago
I will say this again, but I think lowering the barrier to entry has created more problems or issues than it solved, if it even solved anything.
miki123211•23h ago
1. non-humans can create much more content than humans. There's a limit to how fast a human can write, a bot is basically unlimited. Without captchas, we'd all drown in a see of Viagra spam, and the misinformation problem would get much worse.
2. Sometimes the website is actually powered by an expensive API, think flight searches for example. Airlines are really unhappy when you have too many searches / bookings that don't result in a purchase, as they don't want to leak their pricing structures to people who will exploit them adversarially. This sounds a bit unethical to some, but regulating this away would actually cause flight prices to go up across the board.
3. One way searches. E.g. a government registry that lets you get the address, phone number and category of a company based on its registration number, but one that doesn't let you get the phone numbers of all bakeries in NYC for marketing purposes. If you make the registry accessible for bots, somebody will inevitably turn it into an SQL table that allows arbitrary queries.
p3rls•14h ago
4. they'll knock your server offline for everyone else trying to scrape thousands of albums at once while copying your users' uploads for their shitty discord bot and will be begging for donations the entire time too
Aurornis•23h ago
It becomes a problem when it’s used to spam unwanted content faster than your human moderators can come up with.
Someone might bot to scrape your content and repackage it on their own site for profit.
The bots might start interacting with your real users, making them frustrated and driving them away.
metalman•2d ago
ACCount36•1d ago
It's a very old service, active since 00s. Somewhat affiliated with cybercrime - much like a lot of "residential proxies" and "sink registration SMS" services that serve similar purposes. What they're doing isn't illegal, but they know not to ask questions.
They used to run entirely on human labor - third world is cheap. Now, they have a lot of AI tech in the mix - designed to beat specific popular captchas and simple generic captchas.
lan321•1d ago
amelius•1d ago
samschooler•23h ago
https://antcpt.com/eng/download/mozilla-firefox.html
amirhirsch•22h ago
Source: I wrote the og detection system for hCaptcha
rany_•15h ago