10 years ago I published a package on npm called `url-metadata`. It scrapes structured metadata from any URL into a clean, SEO-friendly JSON template format. I would get feature requests in the first few years, mostly SEO-related. Academia.edu asked me to add citations. Other people wanted "price" and "priceCurrency" fields for scraping product pages. Someone recently told me it could be useful for ad enrichment, which I didn't even know about.
Then downloads took off around the same time AI did in late 2023. Hit up to 50k downloads/ week, then fell sharply, now its settled around ~15-30k. All total, ~3.5 million downloads in 10 years, majority of that the last 2 years.
https://npm-stat.com/charts.html?package=url-metadata&from=2...
I have theories. Metadata is exactly what RAG pipelines need at inference time — lightweight, structured, no giant HTML blobs. Claude thinks maybe its being used during the early stages of LLM training runs, to help the LLM figure out what's important on a page (the package has a mode where it returns the response body as a string in addition to the metadata).
But honestly I'm still figuring out what people are building with it since there is not much feedback mechanism for npm packages other than people filing issues and PRs. Most often ppl would file issues just because they were triggering 403 blocks. Adding a "Troubleshooting" section to the README helped.
So now I've built a hosted version and called it Minifetch: no pipeline to spin up, pay per request, no subscriptions. There handy API endpoints and options for SEO and AI Agents. The Extract URL Metadata endpoint with ?verbosity=full option is the drop-in replacement for the npm package — same output, zero infrastructure management to the user. It respects robots.txt and identifies itself transparently to site owners — which in practice means it gets thru to pages that block aggressive scrapers that don't identify themselves.
Would love to know what you'd use it for — that's half the reason I'm posting.
eljee•1h ago
Then downloads took off around the same time AI did in late 2023. Hit up to 50k downloads/ week, then fell sharply, now its settled around ~15-30k. All total, ~3.5 million downloads in 10 years, majority of that the last 2 years. https://npm-stat.com/charts.html?package=url-metadata&from=2...
Here's the JSON template: https://github.com/laurengarcia/url-metadata/blob/master/lib...
And example data: https://minifetch.com/result/example/metadata/ebay
I have theories. Metadata is exactly what RAG pipelines need at inference time — lightweight, structured, no giant HTML blobs. Claude thinks maybe its being used during the early stages of LLM training runs, to help the LLM figure out what's important on a page (the package has a mode where it returns the response body as a string in addition to the metadata).
But honestly I'm still figuring out what people are building with it since there is not much feedback mechanism for npm packages other than people filing issues and PRs. Most often ppl would file issues just because they were triggering 403 blocks. Adding a "Troubleshooting" section to the README helped.
So now I've built a hosted version and called it Minifetch: no pipeline to spin up, pay per request, no subscriptions. There handy API endpoints and options for SEO and AI Agents. The Extract URL Metadata endpoint with ?verbosity=full option is the drop-in replacement for the npm package — same output, zero infrastructure management to the user. It respects robots.txt and identifies itself transparently to site owners — which in practice means it gets thru to pages that block aggressive scrapers that don't identify themselves.
Would love to know what you'd use it for — that's half the reason I'm posting.
minifetch.com