> Stratum is a columnar analytics engine that combines the performance of fused SIMD execution with the semantics of immutable data
What are your thoughts for investing in a columnar based database rather than a hybrid one?
I'm in the game development space.
I have integrated Stratum's columnar indices as a secondary index in the new query engine of https://github.com/replikativ/datahike itself, so for numerical data you will be able to use Datalog/SQL to have combined (OLTP, OLAP, ...) processing. Same for proximum (persistent HNSW vector index) and scriptum (persistent Lucene).
Stratum already can be copy-on-write updated online with better write throughput than purely columnar alternatives (Stratum uses a persistent B-tree over column chunks) as far as I tested. I have not compared it in benchmarks yet though, DuckDB recommends to not update it online for instance. But it depends on the workload, if you do random access writes the columnar layout overhead will still be a slow-down compared to OLTP/Datahike's row/entity-wise indices. Also storing fully variable strings in a column is inefficient, for this you want the entity-wise indices.
whilo•1h ago
The headline benchmark result is that on 10M rows, Stratum is faster than DuckDB on 35 of 46 single-threaded analytical queries, despite running entirely on the JVM.
But the main idea is actually branchable tables: you can fork a table in O(1), keep copy-on-write snapshots, and query different branches through SQL.
It speaks the PostgreSQL wire protocol, so psql/JDBC/DBeaver work out of the box.
Benchmarks, methodology, and repo are linked from the page. Happy to answer questions.