I started KektorDB as a personal challenge to learn Go and database internals. Soon, however, I got hooked: I wanted the project to have some dignity beyond a simple "toy project".
I didn’t follow a rigid roadmap; I iterated based on what felt right. I started by implementing caching and a semantic firewall, and from there, the step towards an integrated RAG pipeline was natural.
To be honest, the choice to integrate RAG comes from my laziness. I tried building a system using Python and LangChain, but I hated managing external scripts and dependencies just to make data talk to the LLM. I wanted a "batteries-included" solution.
However, the first results of my "naive" RAG were disappointing. That’s why I decided to integrate a Lightweight Graph (to semantically link chunks) and techniques like HyDe directly into the engine. All while keeping a fixed constraint: it must remain a single binary, easily embeddable as a Go library.
While KektorDB is a general-purpose embeddable Vector + Graph database, its RAG pipeline is intentionally designed as a practical default. It's not a replacement for complex, heavily customized RAG infrastructures, but a way to get a local system working quickly.
Here is a quick overview of the features:
- HNSW Indexing: With support for Float32, Float16, and Int8 quantization.
- Hybrid Search: Combines vector similarity with BM25 keyword scoring for better accuracy.
- Graph Layer: Maintains a generic adjacency graph alongside vectors. Although the RAG pipeline uses it to link chunks, the system exposes APIs to define arbitrary relationships enabling semantic traversal.
- Persistence: AOF (Append-Only File) + Snapshot.
- RAG Features: Background worker for document ingestion + integrated proxy for query rewriting and Grounded HyDe (OpenAI-compatible).
Current Limitations:
1. It is currently RAM-bound (graph and vectors live in memory). I am working on a hybrid disk-storage engine.
2. Ingestion parsing can be improved (especially regarding tables in PDFs).
The code is pure Go (with optional Rust kernels for specific SIMD operations), all contained in a single binary.
The project started out of a desire to learn, but I would like to continue developing it seriously. For this reason, I would appreciate any kind of technical advice or feedback.
san0n•1d ago
I started KektorDB as a personal challenge to learn Go and database internals. Soon, however, I got hooked: I wanted the project to have some dignity beyond a simple "toy project".
I didn’t follow a rigid roadmap; I iterated based on what felt right. I started by implementing caching and a semantic firewall, and from there, the step towards an integrated RAG pipeline was natural.
To be honest, the choice to integrate RAG comes from my laziness. I tried building a system using Python and LangChain, but I hated managing external scripts and dependencies just to make data talk to the LLM. I wanted a "batteries-included" solution.
However, the first results of my "naive" RAG were disappointing. That’s why I decided to integrate a Lightweight Graph (to semantically link chunks) and techniques like HyDe directly into the engine. All while keeping a fixed constraint: it must remain a single binary, easily embeddable as a Go library.
While KektorDB is a general-purpose embeddable Vector + Graph database, its RAG pipeline is intentionally designed as a practical default. It's not a replacement for complex, heavily customized RAG infrastructures, but a way to get a local system working quickly.
Here is a quick overview of the features:
- HNSW Indexing: With support for Float32, Float16, and Int8 quantization.
- Hybrid Search: Combines vector similarity with BM25 keyword scoring for better accuracy.
- Graph Layer: Maintains a generic adjacency graph alongside vectors. Although the RAG pipeline uses it to link chunks, the system exposes APIs to define arbitrary relationships enabling semantic traversal.
- Persistence: AOF (Append-Only File) + Snapshot.
- RAG Features: Background worker for document ingestion + integrated proxy for query rewriting and Grounded HyDe (OpenAI-compatible).
Current Limitations:
1. It is currently RAM-bound (graph and vectors live in memory). I am working on a hybrid disk-storage engine.
2. Ingestion parsing can be improved (especially regarding tables in PDFs).
The code is pure Go (with optional Rust kernels for specific SIMD operations), all contained in a single binary.
The project started out of a desire to learn, but I would like to continue developing it seriously. For this reason, I would appreciate any kind of technical advice or feedback.
Thanks for reading.
Repository: https://github.com/sanonone/kektordb