Tradeoff: it’s not always smaller than Zstd, but it stays searchable while compressed and minimizes I/O. Key numbers (demo): combined≈19.5% of raw, skip≈99%, lookup p50≈0.18 ms (bloom≈0.30).
10-min reproduction (no marketing): 1) Download the Demo ZIP (Release). 2) Follow README_FIRST.md. 3) Run `python samples/quick_demo.py` → prints ratio/skip/bloom + p50/p95/p99.
ROI quick math: Savings/TB ≈ (1 − 0.195) × Price_per_GB × 1000 (e.g., $0.05/GB → ~$40/TB). NDA/VDR (private, no confidential info in public): [https://docs.google.com/forms/d/e/1FAIpQLScV2Ti592K3Za2r_WLU...]
Happy to answer technical questions (schema-aware layout, delta strategy, bloom density, skip heuristics, failure modes).
kodomonocch1•2h ago
“Will it hold on real data?” Short: Best on repetitive JSON/NDJSON (logs, events, telemetry). We provide a 10-minute demo so anyone can reproduce KPIs and stress it with their own patterns.
“Why not keep a separate index?” Short: Separate indexes add I/O/space and consistency overhead. SEE keeps searchability in the storage format, reducing random I/O and parse costs.
“Are the numbers cherry-picked?” Short: We publish p50/p95/p99, skip (present/absent), and bloom density. The demo script prints them all, along with raw and combined sizes.