Cloudflare Crawl Endpoint

https://developers.cloudflare.com/changelog/post/2026-03-10-br-crawl-endpoint/

59•jeffpalmer•1h ago

Comments

triwats•1h ago

this could be cool to use cloudflare's edge to do some monitoring of endpoints actual content for synthetic monitoring

jasongill•1h ago

I'm surprised that Cloudflare hasn't started hosting a pre-scraped version of websites that use Cloudflare's proxy - something like https://www.example.com/cdn-cgi/cached-contents.json They already have the website content in their cache, so why not just cut out the middle man of scraping services and API's like this and publish it?

Obviously there's good reasons NOT to, but I am surprised they haven't started offering it (as an "on-by-default" option, naturally) yet.

cmsparks•35m ago

That would prolly work for simple sites, but you still need the dedicated scraping service with a browser to render sites that are more complex (i.e. SPAs)

michaelmior•18m ago

> I'm surprised that Cloudflare hasn't started hosting a pre-scraped version of websites that use Cloudflare's proxy

It's entirely possible that they're doing this under the hood for cases where they can clearly identify the content they have cached is public.

csomar•8m ago

It’s a bit more complicated than that. This is their product Browser Rendering, which runs a real browser that loads the page and executes JavaScript. It’s a bit more involved than a simple curl scraping.

8cvor6j844qw_d6•56m ago

Does this bypass their own anti-AI crawl measures?

I'll need to test it out, especially with the labyrinth.

xhcuvuvyc•43m ago

Yeah, that'd be huge, like 90% of my search engine results are just cloudflare bot checks if I don't filter it out.

canpan•41m ago

I feel there is a conflict of interest here..

I'm split between: Yes! At last something to get CF protected sites! And: Uh! Now the internet is successfully centralized.

mdasen•21m ago

If this does bypass their own (and others') anti-AI crawl measures, it'd basically mean that the only people who can't crawl are those without money.

We're creating an internet that is becoming self-reinforcing for those who already have power and harder for anyone else. As crawling becomes difficult and expensive, only those with previously collected datasets get to play. I certainly understand individual sites wanting to limit access, but it seems unlikely that they're limiting access to the big players - and maybe even helping them since others won't be able to compete as well.

adi_kurian•10m ago

Common Crawl has free egress

jsheard•13m ago

They say it doesn't: https://developers.cloudflare.com/browser-rendering/faq/#wil...

AFAICT they don't make any attempt to be stealthy either, so it's easy enough to block them on your own terms if you want. The request are all branded with CF-specific headers which make it obvious what they're doing.

memothon•51m ago

I've used browser rendering at work and it's quite nice. Most solutions in the crawling space are kind of scummy and designed for side-stepping robots.txt and not being a good citizen. A crawl endpoint is a very necessary addition!

Imustaskforhelp•36m ago

This might be really great!

I had the idea after buying https://mirror.forum recently (which I talked in discord and archiveteam irc servers) that I wanted to preserve/mirror forums (especially tech) related [Think TinyCoreLinux] since Archive.org is really really great but I would prefer some other efforts as well within this space.

I didn't want to scrape/crawl it myself because I felt like it would feel like yet another scraping effort for AI and strain resources of developers.

And even when you want to crawl, the issue is that you can't crawl cloudflare and sometimes for good measure.

So in my understanding, can I use Cloudflare Crawl to essentially crawl the whole website of a forum and does this only work for forums which use cloudflare ?

Also what is the pricing of this? Is it just a standard cloudflare worker so would I get free 100k requests and 1 Million per the few cents (IIRC) offer for crawling. Considering that Cloudflare is very scalable, It might even make sense more than buying a group of cheap VPS's

Also another point but I was previously thinking that the best way was probably if maintainers of these forums could give me a backup archive of the forum in a periodic manner as my heart believes it to be most cleanest way and discussing it on Linux discord servers and archivers within that community and in general, I couldn't find anyone who maintains such tech forums who can subscribe to the idea of sharing the forum's public data as a quick backup for preservation purposes. So if anyone knows or maintains any forums myself. Feel free to message here in this thread about that too.

ipaddr•11m ago

"I didn't want to scrape/crawl it myself because I felt like it would feel like yet another scraping effort for AI and strain resources of developers"

You feel better paying someone to do the same thimg?

ljm•34m ago

Is cloudflare becoming a mob outfit? Because they are selling scraping countermeasures but are now selling scraping too.

And they can pull it off because of their reach over the internet with the free DNS.

rrr_oh_man•24m ago

It’s a three letter agency front.

mtmail•10m ago

Any kind of source for the claim?

stri8ted•10m ago

Do you have any evidence to support this view?

Retr0id•21m ago

For a long time cloudflare has proudly protected DDoS-as-a-service sites (but of course, they claim they don't "host" them)

theamk•13m ago

no? it takes 10 seconds to check:

> The /crawl endpoint respects the directives of robots.txt files, including crawl-delay. All URLs that /crawl is directed not to crawl are listed in the response with "status": "disallowed".

You don't need any scraping countermeasures for crawlers like those.

iso-logi•12m ago

Their free DNS is only a small piece of the pie.

The fact that 30%+ of the web relies on their caching services, routablility services and DDoS protection services is the main pull.

Their DNS is only really for data collection and to front as "good will"

its-kostya•9m ago

Cloudflare has been trying to mediate publishers & AI companies. If publishers are behind Cloudflare and Cloudflare's bot detection stops scrapers at the request of publishers, the publishers can allow their data to be scraped (via this end point) for a price. It creates market scarcity. I don't believe the target audience is you and me. Unless you own a very popular blog that AI companies would pay you for.

pupppet•25m ago

Cloudflare getting all the cool toys. AWS, anyone awake over there?

jppope•11m ago

This is actually really amazing. Cloudflare is just skating to where the puck is going to be on this one.

rvz•10m ago

Selling the cure (DDoS protection) and creating the poison (Authorized AI crawling) against their customers.

babelfish•9m ago

Didn't they just throw a (very public) fit over Perplexity doing the exact same thing?

everfrustrated•7m ago

Will this crawler be run behind or infront of their bot blocker logic?

Game Modding with GenAI: A Case Study of Stardew Valley Character Maker

The History of Stoner.com

Wero announces the launch of its ecommerce solution in

Building Kepler

A 1,300-pound NASA spacecraft to re-enter Earth's atmosphere

At what level of deep context engineering does AI output become human-crafted?

State of AI 2026: The $600B inference subsidy, energy bottlenecks, and labor

Tell HN: Vertical tabs has arrived (behind a flag) in Chrome stable

Ask HN: Is Starlink still being jammed in Iran?

RoqueOS – an OS to control your homelab (now on the Apple App Store)

SSH Is the Agent Internet

Show HN: Mumpix – Local-first AI infrastructure and $1B developer grant

MPs give ministers powers to restrict Internet

Amazon Cognito and FusionAuth Comparison

Updating yes(1) to run at 175GiB/s

Log4j – Addressing AI-slop in security reports

Mesa

Bay Area man gets 11 years for $1B solar Ponzi scheme

The State of Video Gaming in 2026 (Early Access Edition)

Think Twice Before Buying or Using Meta's Ray-Bans

Anthropic gives lesson in AI revenue hallucination

Production query plans without production data

Build a deep researcher and learn DSPy Signatures and Modules

AI Is Making Libraries Obsolete

Singularity Is Around?

Do YC companies all use the top sales tools?

Deleted Tweet from Energy Secretary Sends Oil Markets on Another Wild Ride

Evolving the Node.js Release Schedule

DOGE employee stole Social Security data and put it on a thumb drive

Claude Opus 4.6 generated a YouTube poop video with a single prompt