Modern web pages are cluttered with tracking scripts, analytics, styling, ads, and interactive elements that waste tokens and dilute semantic meaning when processing content for AI systems. This library strips away the noise to give you clean, meaningful HTML that:
- Reduces token count by 60-90% (fewer API costs)
- Improves embedding quality (less noise = better semantic search)
- Speeds up processing (smaller payloads = faster inference)
- Preserves structure (headings, paragraphs, links stay intact)
- Zero dependencies (pure JavaScript, no bloat)
ioniq•13h ago
Any chance you’ll add a chunking strategy? If not, I’d love to know what strategy you use for chunking.
nirvanist•12h ago
thank you for comment, probably not in this module but defiantly I m thinking about how to implement this
html5ninja•12h ago
A colleague shared it with me, and I found it pretty cool because it’s simple. actually we will use this for our scraping workflow. thx
nirvanist•14h ago
- Reduces token count by 60-90% (fewer API costs) - Improves embedding quality (less noise = better semantic search) - Speeds up processing (smaller payloads = faster inference) - Preserves structure (headings, paragraphs, links stay intact) - Zero dependencies (pure JavaScript, no bloat)