I am part of an informal group involved in actively archiving websites, and the ones behind Cloudflare Captchas are barely archive-able. I presumed Cloudflare had a deal with Archive.org but I guess it went no where? https://blog.cloudflare.com/cloudflares-always-online-and-th...
charcircuit•1h ago
Are you using ios or macos to have access to private access tokens?
Given that these tokens are intentionally designed to distinguish human from bot traffic, I'd be surprised if they were (easily) available to archival tooling.
charcircuit•1h ago
The URLSession API supports private access tokens (it's handled for you automatically) while your app is foregrounded.
Oh, interesting! But I'd still expect these to be heavily rate limited etc. – otherwise, the people captcha-protected sites are hoping to keep out could just use these, right?
charcircuit•41m ago
At what rate are archivers solving Cloudflare challenges though? Probably not enough to hit any kind of rate limit. This is only used for the initial challenge and not for every request.
qingcharles•46m ago
This looks like a useful solution for scraping. It doesn't prove you're a human, simply that you can afford to buy an iPhone. So buy the cheapest iPhone that supports this on eBay and then use that for scraping and archiving from now on.
sadeshmukh•1h ago
It's still a setting in their dashboard, but the site owner has to manually enable Always Online.
neom•2d ago