The current industry standard (Winterfell/Miden) relies on FFT-based Low-Degree Extensions (LDE), which requires materializing the entire execution trace in memory. This creates a massive memory bottleneck (O(N log N)) that prevents client-side proving at scale.
I implemented a linear-time streaming architecture that avoids the FFT bottleneck. To test it, I ran a Fibonacci AIR on a standard Macbook M3 Max.
The Setup:
Winterfell: Configured with "packed" rows (2 terms/row) to reduce trace length.
Hekate: Configured with "pure" rows (1 term/row), performing 2x the logical work.
The Results:
16 Million Rows (2^24): - Winterfell: 18.34s (24 GB RAM) - Hekate: 6.39s (2.7 GB RAM) - Result: ~3x faster, 9x less memory.
67 Million Rows (2^26): - Winterfell: Crashes (Out of Memory) - Hekate: 28.2s (Peak RAM 11.8GB)
This suggests that for ZK rollups to actually run on client devices, we need to move away from monolithic trace commitments and towards ephemeral state streaming.
Happy to answer questions about the architecture (Binary Tower Fields) or the benchmarks.
y00zzeek•2h ago