On our GitHub events dataset, SEE ended up smaller than Zstd-19 while still supporting random access queries: - combined: 40.4MB vs Zstd 71.8MB (raw 524.1MB) → 7.7% of raw - str: 9.1MB vs Zstd 9.5MB - int: 31.3MB vs Zstd 62.3MB Lookup microbench (one column): p50 ~0.085ms.
Repo + release assets are here: https://github.com/kodomonocch1/see_proto
NDA eval request (optional): https://docs.google.com/forms/d/e/1FAIpQLScV2Ti592K3Za2r_WLU...
Happy to answer questions about the design trade-offs and where this beats “Zstd + separate index”.
Tetsuro•1h ago
1) Demo pack (10-min): prints ratio/skip/bloom + lookup p50/p95/p99. 2) DD pack: strict decode mismatch checks across 3 datasets (mismatch==0) + run_metrics.json / SUMMARY.md.
If you want, I can share which query shapes we optimize for (exists/pos/eq) and why the “searchable-in-format” approach avoids the usual index consistency tax.