frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Cloudflare Crawl Endpoint

https://developers.cloudflare.com/changelog/post/2026-03-10-br-crawl-endpoint/
59•jeffpalmer•1h ago

Comments

triwats•1h ago
this could be cool to use cloudflare's edge to do some monitoring of endpoints actual content for synthetic monitoring
jasongill•1h ago
I'm surprised that Cloudflare hasn't started hosting a pre-scraped version of websites that use Cloudflare's proxy - something like https://www.example.com/cdn-cgi/cached-contents.json They already have the website content in their cache, so why not just cut out the middle man of scraping services and API's like this and publish it?

Obviously there's good reasons NOT to, but I am surprised they haven't started offering it (as an "on-by-default" option, naturally) yet.

cmsparks•35m ago
That would prolly work for simple sites, but you still need the dedicated scraping service with a browser to render sites that are more complex (i.e. SPAs)
michaelmior•18m ago
> I'm surprised that Cloudflare hasn't started hosting a pre-scraped version of websites that use Cloudflare's proxy

It's entirely possible that they're doing this under the hood for cases where they can clearly identify the content they have cached is public.

csomar•8m ago
It’s a bit more complicated than that. This is their product Browser Rendering, which runs a real browser that loads the page and executes JavaScript. It’s a bit more involved than a simple curl scraping.
8cvor6j844qw_d6•56m ago
Does this bypass their own anti-AI crawl measures?

I'll need to test it out, especially with the labyrinth.

xhcuvuvyc•43m ago
Yeah, that'd be huge, like 90% of my search engine results are just cloudflare bot checks if I don't filter it out.
canpan•41m ago
I feel there is a conflict of interest here..

I'm split between: Yes! At last something to get CF protected sites! And: Uh! Now the internet is successfully centralized.

mdasen•21m ago
If this does bypass their own (and others') anti-AI crawl measures, it'd basically mean that the only people who can't crawl are those without money.

We're creating an internet that is becoming self-reinforcing for those who already have power and harder for anyone else. As crawling becomes difficult and expensive, only those with previously collected datasets get to play. I certainly understand individual sites wanting to limit access, but it seems unlikely that they're limiting access to the big players - and maybe even helping them since others won't be able to compete as well.

adi_kurian•10m ago
Common Crawl has free egress
jsheard•13m ago
They say it doesn't: https://developers.cloudflare.com/browser-rendering/faq/#wil...

AFAICT they don't make any attempt to be stealthy either, so it's easy enough to block them on your own terms if you want. The request are all branded with CF-specific headers which make it obvious what they're doing.

memothon•51m ago
I've used browser rendering at work and it's quite nice. Most solutions in the crawling space are kind of scummy and designed for side-stepping robots.txt and not being a good citizen. A crawl endpoint is a very necessary addition!
Imustaskforhelp•36m ago
This might be really great!

I had the idea after buying https://mirror.forum recently (which I talked in discord and archiveteam irc servers) that I wanted to preserve/mirror forums (especially tech) related [Think TinyCoreLinux] since Archive.org is really really great but I would prefer some other efforts as well within this space.

I didn't want to scrape/crawl it myself because I felt like it would feel like yet another scraping effort for AI and strain resources of developers.

And even when you want to crawl, the issue is that you can't crawl cloudflare and sometimes for good measure.

So in my understanding, can I use Cloudflare Crawl to essentially crawl the whole website of a forum and does this only work for forums which use cloudflare ?

Also what is the pricing of this? Is it just a standard cloudflare worker so would I get free 100k requests and 1 Million per the few cents (IIRC) offer for crawling. Considering that Cloudflare is very scalable, It might even make sense more than buying a group of cheap VPS's

Also another point but I was previously thinking that the best way was probably if maintainers of these forums could give me a backup archive of the forum in a periodic manner as my heart believes it to be most cleanest way and discussing it on Linux discord servers and archivers within that community and in general, I couldn't find anyone who maintains such tech forums who can subscribe to the idea of sharing the forum's public data as a quick backup for preservation purposes. So if anyone knows or maintains any forums myself. Feel free to message here in this thread about that too.

ipaddr•11m ago
"I didn't want to scrape/crawl it myself because I felt like it would feel like yet another scraping effort for AI and strain resources of developers"

You feel better paying someone to do the same thimg?

ljm•34m ago
Is cloudflare becoming a mob outfit? Because they are selling scraping countermeasures but are now selling scraping too.

And they can pull it off because of their reach over the internet with the free DNS.

rrr_oh_man•24m ago
It’s a three letter agency front.
mtmail•10m ago
Any kind of source for the claim?
stri8ted•10m ago
Do you have any evidence to support this view?
Retr0id•21m ago
For a long time cloudflare has proudly protected DDoS-as-a-service sites (but of course, they claim they don't "host" them)
theamk•13m ago
no? it takes 10 seconds to check:

> The /crawl endpoint respects the directives of robots.txt files, including crawl-delay. All URLs that /crawl is directed not to crawl are listed in the response with "status": "disallowed".

You don't need any scraping countermeasures for crawlers like those.

iso-logi•12m ago
Their free DNS is only a small piece of the pie.

The fact that 30%+ of the web relies on their caching services, routablility services and DDoS protection services is the main pull.

Their DNS is only really for data collection and to front as "good will"

its-kostya•9m ago
Cloudflare has been trying to mediate publishers & AI companies. If publishers are behind Cloudflare and Cloudflare's bot detection stops scrapers at the request of publishers, the publishers can allow their data to be scraped (via this end point) for a price. It creates market scarcity. I don't believe the target audience is you and me. Unless you own a very popular blog that AI companies would pay you for.
pupppet•25m ago
Cloudflare getting all the cool toys. AWS, anyone awake over there?
jppope•11m ago
This is actually really amazing. Cloudflare is just skating to where the puck is going to be on this one.
rvz•10m ago
Selling the cure (DDoS protection) and creating the poison (Authorized AI crawling) against their customers.
babelfish•9m ago
Didn't they just throw a (very public) fit over Perplexity doing the exact same thing?
everfrustrated•7m ago
Will this crawler be run behind or infront of their bot blocker logic?

Game Modding with GenAI: A Case Study of Stardew Valley Character Maker

https://arxiv.org/abs/2507.13951
1•azhenley•1m ago•0 comments

The History of Stoner.com

https://ron.stoner.com/The_History_Of_Stoner_._com/
1•tinkelenberg•3m ago•0 comments

Wero announces the launch of its ecommerce solution in

https://epicompany.eu/media-insights/wero-announces-launch-ecommerce-in-belgium
1•absqueued•3m ago•0 comments

Building Kepler

https://www.astronomer.io/blog/building-kepler-astronomer-internal-data-assistant/
1•tayloramurphy•5m ago•0 comments

A 1,300-pound NASA spacecraft to re-enter Earth's atmosphere

https://www.bbc.com/news/articles/cd9gwdgg38vo
1•reconnecting•7m ago•0 comments

At what level of deep context engineering does AI output become human-crafted?

1•svstoyanovv•9m ago•0 comments

State of AI 2026: The $600B inference subsidy, energy bottlenecks, and labor

https://lostframe.ai/research
1•willtaubenheim•11m ago•1 comments

Tell HN: Vertical tabs has arrived (behind a flag) in Chrome stable

1•crummy•12m ago•0 comments

Ask HN: Is Starlink still being jammed in Iran?

1•Jblx2•13m ago•0 comments

RoqueOS – an OS to control your homelab (now on the Apple App Store)

https://roqueos.com.br/
1•roqueribeiro•14m ago•1 comments

SSH Is the Agent Internet

https://rolandsharp.com/ssh-is-the-agent-internet/
1•epscylonb•19m ago•0 comments

Show HN: Mumpix – Local-first AI infrastructure and $1B developer grant

https://mumpixdb.com/mumpix-billion-program.html#claim
1•carreraellla•20m ago•0 comments

MPs give ministers powers to restrict Internet

https://www.openrightsgroup.org/press-releases/mps-give-ministers-powers-to-restrict-entire-inter...
2•Jigsy•22m ago•0 comments

Amazon Cognito and FusionAuth Comparison

https://fusionauth.io/blog/amazon-cognito-and-fusionauth-comparison
1•mooreds•23m ago•0 comments

Updating yes(1) to run at 175GiB/s

https://github.com/coreutils/coreutils/commit/2b1c059e6
1•pixelbeat__•25m ago•0 comments

Log4j – Addressing AI-slop in security reports

https://github.com/apache/logging-log4j2/discussions/4052
1•tchalla•25m ago•0 comments

Mesa

https://docs.mesa.dev/content/getting-started/introduction
2•handfuloflight•26m ago•0 comments

Bay Area man gets 11 years for $1B solar Ponzi scheme

https://www.sfgate.com/bayarea/article/bay-area-ponzi-scheme-22063096.php
3•randycupertino•29m ago•0 comments

The State of Video Gaming in 2026 (Early Access Edition)

https://www.matthewball.co/all/presentation-the-state-of-video-gaming-in-2026
1•doener•31m ago•1 comments

Think Twice Before Buying or Using Meta's Ray-Bans

https://www.eff.org/deeplinks/2026/03/think-twice-buying-or-using-metas-ray-bans
5•hn_acker•35m ago•1 comments

Anthropic gives lesson in AI revenue hallucination

https://www.reuters.com/commentary/breakingviews/anthropic-gives-lesson-ai-revenue-hallucination-...
1•latinodev•39m ago•2 comments

Production query plans without production data

https://boringsql.com/posts/portable-stats/
2•birdculture•43m ago•0 comments

Build a deep researcher and learn DSPy Signatures and Modules

https://www.cmpnd.ai/blog/learn-dspy-deep-research.html
2•dbreunig•44m ago•0 comments

AI Is Making Libraries Obsolete

https://maho.dev/2026/03/ai-is-making-libraries-obsolete/
1•mahoivan•45m ago•1 comments

Singularity Is Around?

1•essekar•46m ago•1 comments

Do YC companies all use the top sales tools?

1•justin_cheu•47m ago•0 comments

Deleted Tweet from Energy Secretary Sends Oil Markets on Another Wild Ride

https://www.wsj.com/finance/stocks/deleted-tweet-from-energy-secretary-sends-oil-markets-on-anoth...
1•petethomas•48m ago•0 comments

Evolving the Node.js Release Schedule

https://nodejs.org/en/blog/announcements/evolving-the-nodejs-release-schedule
3•suresh70•48m ago•0 comments

DOGE employee stole Social Security data and put it on a thumb drive

https://techcrunch.com/2026/03/10/doge-employee-stole-social-security-data-and-put-it-on-a-thumb-...
13•elsewhen•52m ago•1 comments

Claude Opus 4.6 generated a YouTube poop video with a single prompt

https://twitter.com/josephdviviano/status/2031196768424132881
1•dokdev•52m ago•2 comments