I am one of the cofounders of http://turingdb.ai. We built TuringDB while working on large biological knowledge graphs and graph-based digital twins with pharma & hospitals, where existing graph databases were unusable for deep graph traversals with hundreds or thousands of hops on (crappy) machines you can find in a hospital.
https://github.com/turing-db/turingdb
TuringDB is a new in-memory, column-oriented graph database optimised for read-heavy analytical workloads:
- Milliseconds (1) for multi-hop queries on graphs with 10M+ nodes/edges
- Lock-free reads via immutable snapshots
- Git-like versioning for graphs (branch, merge, time travel queries)
- Built-in graph exploration UI for large subgraphs
We wrote TuringDB from scratch in C++ and designed to have predictable memory and concurrency behaviour.
For example, for the Reactome biological knowledge graph, we see ~100× to 300× speedups over Neo4j on multi-hop analytical queries out of the box (details in first comment).
A free Community version is available and runnable locally:
https://docs.turingdb.ai/quickstart
https://github.com/turing-db/turingdb
Happy to answer technical questions.
(1): We actually hit sub-millisecond performance on many queries
remy_boutonnet•2h ago
We built TuringDB because our workloads were dominated by analytical graph queries (multi-hop traversals, neighborhood expansion, similarity analysis) on large, relatively stable graphs, extracted from scientific literature. After all, scientists don’t publish millions of new papers per second (yet). Write transactions throughput was not the bottleneck, it was latency when you need to go deep.
A few design choices that may be of interest:
- Column-oriented graph storage
Nodes, edges and properties are stored all adjacently column-wise to maximise cache locality during traversals. This isn’t a relational system with joins layered on top, and nodes & edges are not their own distinct heap-allocated objects like in Neo4J or Memgraph, all of them are stored together in big columnar storage, for memory efficiency and decrease the amount of random pointer-chasing done by the engine. Property values are also stored all together column-wise for all the nodes & edges so filtering nodes by property value is quite fast out of the box even without any index.
We also implemented a streaming query engine for Cypher from scratch so that nodes and edges are processed by chunks in a streaming fashion to maximise cache efficiency.
- Immutable snapshots and lock-free reads
Every read query runs against a consistent immutable snapshot of the graph. Reads are never locked, and writes never block reads. We eliminated all the locks on the read path once a snapshot is acquired. By comparison, Memgraph has to acquire a lock on each node & edge when traversing graphs from node to node. Mutexes cost CPU cycles.
This makes long-running analytical queries predictable and avoids performance cliffs under concurrency.
- Versioning as part of the storage model
Every change creates a commit just like in git. You can query any historical version of the graph at full speed, branch datasets for experiments or simulations, and merge changes back. This is critical for regulated or safety-critical domains where auditability and reproducibility matter.
- We like C++ and TuringDB was born as an experiment in design space
The engine is written in C++ from scratch because we like C++ and it’s fun.
We implemented our own storage engine, query engine and column format from the ground up. We wanted to bring columnar storage and column-oriented streaming query execution to the world of graph databases. We wanted to make a graph DB that’s heavily focused on read intensive workloads for once, instead of transactional performance. In that sense TuringDB is also an experiment in the space of possible designs for a graph database engine.
We believe in paying very careful attention to memory layout, clear execution paths, not using any external magic that has not been thought through for what we want to build.
- Knowledge graphs and GraphRAG
A common use case is grounding LLMs in structured graph context rather than relying on text-only retrieval. We’re shipping native vector search and embeddings inside TuringDB this week so graph traversal and vector similarity can be combined in one system.