As someone with multiple RFCs, this is the way it's always been done. Industry has a problem, there's some collaboration with other industry or academia, someone submits a draft RFC. People are either free to adopt it or not. Sometimes there's competing proposals that are accepted, and sometimes the topic dies entirely.
> both of which raise the friction to participation in the web even higher for actual humans
Absolutely nothing wrong with this, as it's site owners that make the decision for their own sites. Yep, I do want some friction. The tradeoff saves me a ton of money. Heck, I could block most ASNs and email domains and still keep 99% of my customers.
> Seems like pretty soon there'll only be one or two browsers which can even hope to access sites behind cloudflare's infrastructure
This proposal is about bots identifying themselves through open HTTP headers.
The problem is that to CF, everything that isn't Chrome is a bot (only a slight exaggeration). So browsers that aren't made by large corporations wouldn't have this. It's like how CF uses CORS.
CORS isn't only CF but it's an example of their requiring obscure things no one else really uses, and using them in weird ways that causes most browser to be unable to do it. The HTTP header CA signing is yet another of these things. And weird modifications of TLS flags fall right in there too. It's basically Proof-of-Chrome via Gish Gallop of new "standards" they come up with.
>Absolutely nothing wrong with this, as it's site owners that make the decision for their own sites.
I agree. It's their choice. I am just laying out the consequences of these mostly uninformed choices. They won't be aware that they're blocking a large number of their actual human visitors initially. I've seen it play out again and again with sites and CF. Eventually the sites are doing as much work maintaining their whitelists of UAs and IPs that one wonders why they use CF at all if they're doing the job instead.
And that's not even starting on the bad and aggressive defaults for CF free accounts. In the last month or two they have slightly improved this. So there's some hope. They know they are a problem because they're so big,
"It was a decision I could make because I’m the CEO of a major Internet infrastructure company." ... "Literally, I woke up in a bad mood and decided someone shouldn't be allowed on the Internet. No one should have that power." - Cloudflare CEO Matthew Prince
(ps. You made some good and valid points, re: IETF process status quo, personal choice, etc, it's not me doing the downvotes)
The real complaint should be about having to adopt another standard, and whether they’ll discriminate against applications like legacy RSS readers, since they’re considered a type of bot.
I'm not saying you should sit down with the iptables manual and start going through the logs, but I can see the idea taking off if all it takes is (say) one apt-get and two config lines.
[1] https://stackoverflow.com/questions/1035283/will-it-ever-be-...
https://xeiaso.net/blog/2025/anubis/
I love the approach.. If I could be arsed blogging I'd probably set it up myself.
I recently implemented a very similar thing to its obfuscation via proof-of-work (https://altcha.org/docs/obfuscation/) in my C++ REST backend and flutter front-end, and use it for rate-limiting on APIs that allow creation of a new account or sending sign-up e-mails.
I have an authentication token that's then wrapped with AES-GCM using a random IV and the client is given the key, IV stem and a maximum count for the IV.
The current situation is getting worse day after day because everybody want to ScRaPe 4lL Th3 W38!!
Verifying Ed25519 signature is almost free on modern CPUs, I just wonder why they go with an obscure RFC for HTTP signatures instead of using plain JSON Web Tokens in an header.
JWTs are universal. Parsing this custom format will certainly lead to a few interesting bugs.
PaulHoule•6h ago
(1) It's always been easy to write bots [1] [2]. If you knew beautifulsoup well you could often write a scraper in 10 minutes, now people will ask ChatGPT to write a scraper for them and have a scraper ready in 15 minutes so they're discovering how easy it is, how you don't have to limit yourself to public APIs that are usually designed to limit access, not expand it.
(2) Instead of using content to train an AI you can feed it into an AI for inference. For instance, you can tell the AI to summarize pages or to extract specific facts from pages or to classify pages. It's increasingly possible to develop a workflow like: classify 30,000 RSS feed items, select 300 items that the user will probably find interesting, crawl those 300 pages looking for hyperlinks to scientific journal articles or other links that would be better to post, crawl those links to see if the journal articles are open access, weigh various factors to decide what's likely to be the best link, do specialized image extraction so I can make a good social post, etc. It's not too hard to do but it all comes falling down if the bot has to click on fire hydrants endlessly.
[1] Polite crawlers limit how many threads they have running against a single server. If you only have one thread per server you are unlikely to overload it. If you want to make a crawler with a large thread count that is crawling a large number of servers it can be a hassle to implement this, particularly if you want to maximize performance or run a large distributed crawler. However a lot of times I do a crawling project that targets one site or five sites or that maybe crawls 1000 documents a day and in those cases the single-threaded crawler is fine.
[2] For some reason, my management has always overestimated the work of building scrapers, I think because they've been burned by UI development which is always underestimated. The fact that UI development is such a bitch actually helps with crawler development -- you might be afraid that the target site is going to change but between the high cost of making changes and the fact that Google will trash your SEO if you change anything about your site, the target site won't change.
showerst•4h ago
dboreham•4h ago