I built this because I was getting tired of my RAG contexts getting clogged with HTML tags, invisible characters, and excessive whitespace from web scrapers.
It's a zero-dependency (well, almost) library designed to sit between your data source and your prompt construction.
Key features:
* Standardizes whitespace and strips HTML.
* Smart truncation (middle-out) to fit context windows.
* PII redaction for privacy.
* A "TokenPacker" to manage budget across multiple inputs (coming soon).
Fun fact: I just renamed it from "prompt-groomer" to "prompt-refiner" yesterday based on Reddit feedback (long story, lesson learned about naming!).
Benchmarks show it adds <3ms latency for typical 10k token contexts, which is negligible compared to the 20%+ token savings.
xinghaohuang•1h ago
I built this because I was getting tired of my RAG contexts getting clogged with HTML tags, invisible characters, and excessive whitespace from web scrapers.
It's a zero-dependency (well, almost) library designed to sit between your data source and your prompt construction.
Key features: * Standardizes whitespace and strips HTML. * Smart truncation (middle-out) to fit context windows. * PII redaction for privacy. * A "TokenPacker" to manage budget across multiple inputs (coming soon).
Fun fact: I just renamed it from "prompt-groomer" to "prompt-refiner" yesterday based on Reddit feedback (long story, lesson learned about naming!).
Benchmarks show it adds <3ms latency for typical 10k token contexts, which is negligible compared to the 20%+ token savings.
Happy to answer any questions!