> This is bot detection at the application layer, not the browser layer.
I kind of just assumed that all sophisticated bot-detectors and adblock-detectors do this? Is there something revealing about the finding that ChatGPT/CloudFlare's bot detector triggers on "javascript didn't execute"?
Coincidentally about an hour ago, I wanted to look something up in ChatGPT and I happened to be in a browser window I don’t normally use, with no logged in accounts. I assumed it wouldn’t work, but to my surprise with no account, no cookies of any kind it took my query and gave me an answer.
They allowed anonymous requests for months now, maybe even a year.
I noticed the ChatGPT app also checks Play Integrity on Android (because GrapheneOS snitches on apps when they do this), probably for the same reason. Claude's app doesn't, by the way, but it also requires a login.
You can probably run 50 of those simultaneously if you use memory page deduplication, and with a decent CPU+GPU you ought to be able to render 50 pages a second. That's 1 cent per thousand page loads on AWS. Damn cheap.
More info here:
https://web.archive.org/web/20231107182321/https://mu0.cc/20...
if you browse them you will see that bot writers are very annoyed if they can't scrape a site with a headless browser.
you can do what you suggested, but with Linux VMs/containers. windows is too heavy, each VM will cost you 4 GB of RAM
Honestly it is a very healthy competitive market with reasonably low switching costs which drives prices down. These circumstances make rolling your own a tough sell.
Is it TSA's "fault" that non-terrorists are subject to screening?
The "quality" of TSA's screening seems be pretty bad too given how many people have to go through secondary screening vs how many terrorist they catch (0?)
(It's a different question whether zero friction is actually desired, or whether some security theater is actually part of the service being provided, but that's a different question.)
Imagine an apartment building with a flimsy front door lock that breaks all the time, and the landlord only telling you that that can't be helped because of all the burglars.
I'm building Safebox and Safecloud, where this won't be the case anymore. Not only will you have a decentralized hosting network that can sideload resources (e.g. via a browser extension that looks at your "integrity" attribute on websites) but also the websites will require you to be logged in with a HMAC-signed session ID (which means they don't need to do any I/O to reject your requests, and can do so quickly)... so the whole thing comes down to having a logged in account.
https://github.com/Safebots/Safecloud
As far as server-to-server requests, they'll be coming from a growing network of cryptographically attested TPMs (Nitro in AWS, also available in GCP, IBM, Azure, Oracle etc.) so they'll just reject based on attestations also.
In short... the cryptographically attested web of trust will mean you won't need cloudflare. What you will need, however, to prevent sybil attacks, is age verification of accounts (e.g. Telegram ID is a proxy for that if you use Telegram for authentication).
"No s̶o̶u̶p̶ internet for you!"
Good luck!
I'm running firefox and seeing the normal amount.
I can't tell whether you're serious but in case you are, this theory immediately falls apart when you realize waymo operates at night but there aren't any night photos.
I'm certain the User-Agent is part of it. I know that for certain because a very reliable way I can trigger the CF stuff is this plugin with the wrong browser selected [1].
[1] https://addons.mozilla.org/en-US/firefox/addon/uaswitcher/
Maybe I just have to disable all ad blockers and Safari tracking prevention? Or I guess I could send a link to a scan of my photo ID in a custom request header like X-Please-Cloudflare-May-I-Use-Your-Open-Web?
That's... exactly expected? It's a cat and mouse game. People running botnets or AI scrapers aren't diligently setting the evil bit on their packets.
But hey, at least some bots are also not making it past Cloudflare!
I have yet to see a use case for VPNs for the casual internet audience, and for a tech savvy user, their better off renting through some datacenter or something, which at that point is hardly a VPN and more home IP obfuscation. All the same downsides, and at least you get real privacy.
Source? I haven't seen any evidence that the major paid VPN providers engage in any of those things. At best it's vague implications something shady is happening because one of the key people was previously at [shady organization].
MullvadVPN is also another great one.
I have heard some good things about AirVPN, but I can absolutely attest for mullvad and to a degree ProtonVPN (Just with Proton, depending upon your threat model, do make the necessary precautions like buying with monero for example)
There are others, but mostly its the 2-3 that I trust.
Mullvad.
It has been proven in a court of law that when Mullvad says "no logging", they mean it.
They also regularly have security audits and publish the results[2][3]
[1]https://mullvad.net/en/blog/mullvad-vpn-was-subject-to-a-sea... [2]https://mullvad.net/en/blog/new-security-audit-of-account-an... [3]https://mullvad.net/en/blog/successful-security-assessment-o...
I don't do free work. I'm not going to label 50 images of crosswalks and motorcycles for free.
Curious how do you know this?
It’s worth remembering how replaceable basically all websites and software are and how little value they add. There’s no reason to reward or tolerate those that noticeably degrade performance with couldflare et al
At least outside the US, there's 3DS as an (admittedly often high friction) high quality cardholder verification method, but in the US, that's of course considered much too consumer-hostile, so "select 87 overpasses" it is.
is it Linux (or similar)?
is it Firefox?
If yes, to one or both, you're blocked! Clearly millions of dollars of engineering talent and petabytes of data collection should be able to come up with something more nuanced than this.
I'm getting it on iCloud Private Relay all the time. It honestly makes it kind of useless.
Maybe that's the point? But then again, doesn't Cloudflare run part of it!? And wasn't there some "privacy-preserving captcha replacement" that iOS devices should already be opting me in to? So many questions, nobody there to answer them, because they can get away with it.
> The part that’s really annoying is its trivial for bots to bypass.
Not the ethical bots, though! My GPT-backed Openclaw staunchly refuses to go anywhere near a "I'm not a robot" button.
Of course then you got sites like gnu.org too that block you because your slightly outdated user agent.
My best guess is -- ChatGPT is running something in your browser to try to determine the best things to send down to the model API –- when it should have been running quantized models on its own server.
Really really bad user experience, wondering about when they will leave this approach.
Also, you can have it spotcheck colors: light orange on light background is unreadable, ask it to find the L*[1] of colors and dark/lighten as necessary if gap < 40 (that's minimum gap for yuge header text on background, 50 for text on background, these have gap of 25)
I haven't tried this yet, but, maybe have it count word count-per-header too. It's got 11 headers for 1000 words currently, makes reading feel really stacatto and you gotta evaluate "is this a real transition or vibetransition"
[1] L* as in L*a*b*, not L in Oklab
Every provider seems to have been plauged by these freeloaders to such an extent that they've had to develop extreme and onerous countermeasures just to avoid losing their shirts.
What's the word? Schadenfreude?
A big reason we invest in this is because we want to keep free and logged-out access available for more users. My team’s goal is to help make sure the limited GPU resources are going to real users.
We also keep a very close eye on the user impact. We monitor things like page load time, time to first token and payload size, with a focus on reducing the overhead of these protections. For the majority of people, the impact is negligible, and only a very small percentage may see a slight delay from extra checks. We also continuously evaluate precision so we can minimize false positives while still making abuse meaningfully harder.
Is this to be expected? I would presume that if I'm authenticated and paying, VPN use wouldn't be a worry. It would be nice to be able to use the tool whether or not I'm on a VPN.
But don't you run these checks on logged-in users too?
Meanwhile, the rest of us (well, not me, because I don’t use your garbage product, but lots of others do) have to suffer and have our compute resources used up in the name of “protection.”
Also if you could pass this over: it takes 5 taps to change thinking effort on ios and none (as in completely hidden) on macos.
If I were to guess it seems that you were trying to lower the token usage :-). Why the effort is only nicely available on web and windows is beyond me
This would be fucking HILARIOUS if it wasn't so tragic.
Nick, I understand the practical realities regarding why you'd need to try to tamp down on some bot traffic, but do you see a world where users are not forced to choose between privacy and functionality?
It's also possible to make Firefox route each container through a different proxy which could be running locally even which then can connect to multiple different VPN's. I haven't tried doing that but its certainly possible.
It's sort of possible to run different browsers with completely new identities and sometimes IP within the convenience of one. It's really underrated. I don't use the IP part of this that I have mentioned but I use multi containers quite a lot on zen and they are kind of core part of how I browse the web and there are many cool things which can be done/have been done with them.
Yes, even their "humanifesto" is LLM output, and is written almost exclusively in the "it's not X <emdash> it's Y" style.
[0]: https://github.com/magicseth/keywitness/graphs/contributors
What are you talking about? It works fine with firefox with RFP and VPN enabled, which is already more paranoid than the average configuration. There are definitely sites where this configuration would get blocked, but chatgpt isn't one of them, so you're barking up the wrong tree here.
Typing the chat box is slow, rendering lags and sometimes gets stuck altogether.
I have a research chat that I have to think twice before messaging because the performance is so bad.
Running on iPhone 16 safari, and MacBook Pro m3 chrome.
It's perfectly possible to write fast or slow web applications in React, same as any other framework.
Linear is one of the snappiest web applications I've ever used, and it is written in React.
They did it because a lot of devices running Netflix (TVs, DVD players, etc) were underpowered and Netflix was not keen on writing separate applications. They did, however, invest into a browser engine that would have HW acceleration not just for video playback but also for moving DOM elements. Basically, sprites.
The lost art of writing efficient code...
Can you share these mitigations so we can mitigate against you?
This has to be a joke, right?
"Prove your humanity/age/other properties" with this mechanism quickly goes places you do not want it to go.
You've been to the doctor recently, right? Given them your SSN? Every identity system ever built was going to be scoped || voluntary. None of them stayed that way.
Once you have the identity mechanism, "Oh it's zero knowledge! So let's use it for your age! Have you ever been convicted?" which leads to "mandated by employers" which leads to...
We've seen this goddamn movie before. Let's just skip it this time? Please?
There should be nothing submitted, except maybe anonymous telemetry metrics at that point.
I ask because I have seen huge variations in load time. Sometimes I had to wait seconds until being able to type. Nowadays it seems better though.
I assumed it was maybe some tokenization going on client side, but now I realize maybe it's some proof of work related to prompt length?
Sick.
this is meaningless btw. A browser headless or not does execute javascript.
I read it to mean: "A browser that doesn't execute the JavaScript bundle won't have [the rendered React elements]." Which is true.
beering•2h ago
superkuh•1h ago
voxic11•1h ago
beering•1h ago
superkuh•1h ago
And as for "but chatgpt isn't paid" (another commenter), well, then yes, that's even closer to free by removing this spying on your computer setup. But they spy on the paid users too.
gruez•54m ago
There's no way this is worth it unless the models are absolutely tiny, in which case any benefits from offloading to the client is marginal and probably isn't worth the engineering effort.