The results on large datasets were pretty startling.
The tl;dr from the benchmarks:
The Search Gap: On a 65GB MongoDB JSON log dataset, searching for a rare string (like a specific "ERROR" that occurs 0 times) took standard zstdcat | grep over 8 minutes. Crystal finished the same query in 0.8 seconds.
How? It uses internal Bloom filters to instantly skip huge sections of compressed data, and only decompresses necessary blocks when hits occur. Even on queries with millions of matches, it was still ~4x faster than raw grep.
Compression Performance: At Level 3, it clocked between 800 MB/s and 1300 MB/s on several datasets. At Level 19, it matches ZSTD-19 compression ratios but compresses roughly 10x faster.
We want to know how this fits your infrastructure.
Every logging pipeline is different. We are currently prioritizing packaging for various environments (CLI, K8s sidecar, Docker, etc.).
If you are interested in testing Crystal against your own log deluge, please let us know your preferred integration method in this 3-question form: https://docs.google.com/forms/d/e/1FAIpQLSehstef-rbLfM72scgx...
It helps us prioritize what to build next for real-world deployments.