Some notes on the improvements:
1. using csv (serde) for writing leads to some big gains
2. arena allocation of incoming keys + storing references in the hashmap instead of storing owned values heavily reduced the number of allocations and improves cache efficiency (I'm guessing, I did not measure).
There are some regex functionalities and some table filtering built in as well.
happy hacking
southwindcg•2h ago
noamteyssier•2h ago
It's great when you quickly need to see what the distribution of classes in an input stream is. This pops up all the time. Like measuring different types of log messages, counting the variants of a field in a csv, finding the most common word or substring, etc.
southwindcg•59m ago