frontpage.

I've been reading up on crawler architecture. The two most useful sources I've found are the blog post "Crawling a billion web pages in just over 24 hours, in 2025" and the Mercator paper ("Mercator: A Scalable, Extensible Web Crawler").

Both of these, and most other material I've come across, focus on crawling the broad open web rather than a targeted set of domains. For product prices it's the latter. Mercator calls out DNS resolution as a major bottleneck, for example, but when you're only hitting a few hundred domains that isn't really a concern.

The other gap is that both assume static HTML. For our use case we need a headless browser, and we also have to deal with Cloudflare and similar anti-bot systems.

For product prices specifically, a lot of sites publish price feeds which simplifies things, but plenty don't, and getting good coverage still requires scraping. Our current system does about 500M pages/day and we're looking to improve its performance.

Does anyone here have experience in this space, or know of articles/blog posts on scaling targeted (rather than broad) crawlers with headless browsers? Any pointers appreciated.

France Keeps Breaking the Internet to Stop Piracy, Even Though It's Not Working

I'm a construction superintendent. I used AI to build an AI course

Building the Agentic State in Estonia: What is taking shape

Ask HN: What are your top open source apps/software?

MaterialX Needs a Single-File Container: MTLZ

Corpus Christi plans to declare a 'water emergency'

'Too Dangerous to Release' Is Becoming AI's New Normal

Grok plays along with researchers pretending to be delusional

ThereminGoat's Switches

The pope moves to police AI

Show HN: MeshCore Plays Pokémon: Collaborative Pokémon, no internet required

Get Design MD: Markdown files with the look of popular websites for use with AI

Graffiti NYC: paint every square meter of every building

US soldier charged with using Intel to win $400K Polymarket bet on Maduro raid

Spread Complexity and fidelity for entangled states with Python

U.S. Soldier Charged with Using Classified Intel to Profit from Polymarket Bets

AI gave me a perfect report. I still didn't trust it

P&G warns of $1B profit hit in fiscal 2027 from higher oil prices

lahsa.ai – AI-native Los Angeles Homeless Services Authority

Ask HN: Why is cache for DeepSeek-v4 cheapest on Vercel AI Gateway?

Sony AI Announces Real-World Artificial Intelligence and Robotics

Internal vs. External Storage: What's the Limit of External Tables?

Show HN: Historical Python source documentation, from 1.0.1 through 2.0c1

WFY24 – Solving the "Average Weather" fallacy at 8,848M (Everest)

Show HN: Moltnet – open-source local chat for AI agents

Meta signs agreement with AWS to power agentic AI on Amazon's Graviton chips

MTA Aims to Teach More Drivers How to Use Wheelchair Lifts on Express Buses

Notes on running an AI agent with ADHD

"AI is built by Chinese people in the U.S. and Chinese people in China."

Show HN: DB Pro Studio – Self-hostable collaborative database client

Ask HN: Scaling a targeted web crawler beyond 500M pages/day