Part 3 of my distributed database series. This one covers the storage engine decision - the foundation everything else sits on.
Main topics: how key encoding lets a single ordered KV store emulate document/graph/time-series models, LSM-tree vs B-tree trade-offs, and the benchmark that killed my pure Elixir dreams.
I wanted to use CubDB (pure Elixir, no NIF risks, easy debugging). The benchmarks said otherwise: RocksDB was 177x faster on writes and used 26,000x less memory during batch operations. For a distributed database, that gap is insurmountable.
The post also covers living with NIFs in Elixir - they bypass the BEAM scheduler, so a crash kills your VM instead of just a process. You architect around it: shard isolation, replication, aggressive monitoring.
Also discussed: RocksDB column families (underrated feature for multi-model storage), write amplification as the LSM-tree tax, and why this approach handles time-series data but won't compete with columnar engines like ClickHouse for pure analytics.
Next post will cover Raft consensus for metadata and how the CP metadata plane coordinates with the AP data plane.
Happy to discuss storage engine choices, NIF risk mitigation, or whether the CubDB benchmarks surprised anyone else who's used it.
gawry•41m ago
Main topics: how key encoding lets a single ordered KV store emulate document/graph/time-series models, LSM-tree vs B-tree trade-offs, and the benchmark that killed my pure Elixir dreams.
I wanted to use CubDB (pure Elixir, no NIF risks, easy debugging). The benchmarks said otherwise: RocksDB was 177x faster on writes and used 26,000x less memory during batch operations. For a distributed database, that gap is insurmountable.
The post also covers living with NIFs in Elixir - they bypass the BEAM scheduler, so a crash kills your VM instead of just a process. You architect around it: shard isolation, replication, aggressive monitoring.
Also discussed: RocksDB column families (underrated feature for multi-model storage), write amplification as the LSM-tree tax, and why this approach handles time-series data but won't compete with columnar engines like ClickHouse for pure analytics.
Next post will cover Raft consensus for metadata and how the CP metadata plane coordinates with the AP data plane.
Happy to discuss storage engine choices, NIF risk mitigation, or whether the CubDB benchmarks surprised anyone else who's used it.