Open Source Mystery: 3.5M downloads, But what are people using this for?

1•eljee•1h ago

Comments

eljee•1h ago

10 years ago I published a package on npm called `url-metadata`. It scrapes structured metadata from any URL into a clean, SEO-friendly JSON template format. I would get feature requests in the first few years, mostly SEO-related. Academia.edu asked me to add citations. Other people wanted "price" and "priceCurrency" fields for scraping product pages. Someone recently told me it could be useful for ad enrichment, which I didn't even know about.

Then downloads took off around the same time AI did in late 2023. Hit up to 50k downloads/ week, then fell sharply, now its settled around ~15-30k. All total, ~3.5 million downloads in 10 years, majority of that the last 2 years. https://npm-stat.com/charts.html?package=url-metadata&from=2...

Here's the JSON template: https://github.com/laurengarcia/url-metadata/blob/master/lib...

And example data: https://minifetch.com/result/example/metadata/ebay

I have theories. Metadata is exactly what RAG pipelines need at inference time — lightweight, structured, no giant HTML blobs. Claude thinks maybe its being used during the early stages of LLM training runs, to help the LLM figure out what's important on a page (the package has a mode where it returns the response body as a string in addition to the metadata).

But honestly I'm still figuring out what people are building with it since there is not much feedback mechanism for npm packages other than people filing issues and PRs. Most often ppl would file issues just because they were triggering 403 blocks. Adding a "Troubleshooting" section to the README helped.

So now I've built a hosted version and called it Minifetch: no pipeline to spin up, pay per request, no subscriptions. There handy API endpoints and options for SEO and AI Agents. The Extract URL Metadata endpoint with ?verbosity=full option is the drop-in replacement for the npm package — same output, zero infrastructure management to the user. It respects robots.txt and identifies itself transparently to site owners — which in practice means it gets thru to pages that block aggressive scrapers that don't identify themselves.

Would love to know what you'd use it for — that's half the reason I'm posting.

minifetch.com

B-trees and database indexes (2024)

BookingCom Data Breach: Unauthorized Access to Booking Information

Kraken Security Update

When does generative AI qualify for fair use? (2024) by previous openAI employee

AI agent remembers your secrets

Disputed Boundaries Policy

Rust Program Management Board

Apple Ramps Up MacBook Neo Production to 10M Units Amid Strong Demand

Palantir Stock Continues to Fall. Not Even the President Can Erase the Losses

Show HN: Access X, Reddit, Threads and all social media data from a single API

A Picture is Worth a Thousand Tokens

Dual national Londoner stranded in Spain by new border rule

Problems Before the Real Problem: The First Lessons of Apollo 13

Apple Reportedly Testing AI Glasses in Several Frame Styles

How to Stop Cops from Using Wi-Fi to "See Through the Walls" of Your Home [video]

Show HN: Curation: Share Podcast Recommendation

OpenAI's latest internal memo about beating the competition

Mount GitHub repositories as a virtual read-only macOS filesystem

I Rode in a Waymo with a Litigator: Here's What I Learned

Show HN: Is Claude still thinking? How are you wasting life?

Show HN: Hitoku Draft – context aware local macOS assistant

Running (and Coding with) Local AI on a Mac

Linux 7.0 debuts as Linus Torvalds ponders AI's bug-finding powers

New disclosures reveal how DOGE worked

Want to understand the current state of AI? Check out these charts

Why most AI projects feel useless

Lobsters Interview with Internet_Jannitor

Show HN: Asthi – Damn good asset tracker

Show HN: CRXcavator, but Better

I vibe coded a feed reading web app. It was enlightening and uncomfortable