The part I want feedback on is the CLI. It's built for LLMs, not humans. JSON output when piped, feedstock schema crawl dumps every parameter at runtime, and --fields url,markdown lets you pull just what you need so a crawl result doesn't eat your whole context window. Other bits worth a look:
Fetch-first engine. Tries plain HTTP before booting a browser, escalates only if the page needs JS.
Deep crawl with BFS, DFS, a UCB1 bandit, and a Q-learning focused crawler. The learning ones seem to help on big docs sites but I haven't measured it carefully yet.
Accessibility tree snapshots instead of HTML. 3 to 10x smaller, easier to feed a model.
Cache uses bun:sqlite with ETag, Last-Modified, and content hashing.
v0.5.0, Apache 2.0, 325 tests. Just pushed it so the star count is what it is.