Needed any URL as clean Markdown for LLM context — including Cloudflare/anti-bot sites. curl gets HTTP 403 on those, raw HTML is 80%+ nav noise eating context, paid SaaS (Firecrawl, Jina) wasn't an option for me.
It's a Docker wrapper around two existing OSS tools — CloakBrowser (stealth Chromium that passes Cloudflare) and rs-trafilatura (HTML → Markdown). No new scraper, just glue. Runs locally, my URLs stay on my box
Token reduction (raw curl HTML vs snitchmd, tiktoken cl100k_base):
- cloudflare.com/learning/bots — curl: HTTP 403 → snitchmd: 0.8k
- docs.docker.com/engine/install — 187k → 0.9k
- en.wikipedia.org/wiki/LLM — 222.7k → 29.7k
Heads up: passes Cloudflare, can't solve "click traffic lights" captchas (reCAPTCHA v2, hCaptcha)
MIT. Happy to answer questions
sc0rp10•1h ago