All streaming processors face the same fundamental problem:
Streaming joins require maintaining state for both sides of the join
High-cardinality data (millions of unique keys) means huge state sizes
Traditional approach: Keep everything in memory will make memory exhausted
The high-cardinality join memory problem isn't unique to Timeplus. Apache Flink also uses hybrid hash joins that spill to disk (RocksDB) when memory fills, Materialize shares indexed state across multiple queries (but still requires keeping full datasets in memory), and RisingWave stores state in cloud object storage (S3/GCS) with LRU caching for hot data. What makes Timeplus different is its purpose-built optimization for the Pareto Principle, where a tiny fraction of data generates the vast majority of activity - keeping hot data in memory and cold data on disk for dramatic memory savings.
tingfirst•1h ago
Streaming joins require maintaining state for both sides of the join
High-cardinality data (millions of unique keys) means huge state sizes
Traditional approach: Keep everything in memory will make memory exhausted
The high-cardinality join memory problem isn't unique to Timeplus. Apache Flink also uses hybrid hash joins that spill to disk (RocksDB) when memory fills, Materialize shares indexed state across multiple queries (but still requires keeping full datasets in memory), and RisingWave stores state in cloud object storage (S3/GCS) with LRU caching for hot data. What makes Timeplus different is its purpose-built optimization for the Pareto Principle, where a tiny fraction of data generates the vast majority of activity - keeping hot data in memory and cold data on disk for dramatic memory savings.