Mem0, Letta, Zep, etc, all target memory as 'personalization' or 'personal context' — they store user preferences, conversation history, individual agent state. None of them use reasoning structure, or are designed for multi-agent memory sharing. The problem I tried to solve was that my agent loops know what changed, but not why it worked or failed.
The retrieval architecture ended up being an interesting engineering problem. Three signals (FTS5, QJL-accelerated sqlite-vec cosine similarity, knowledge graph traversal) are each fast independently, but are for different things. FTS5 is fast and exact. Vector search handles semantic similarity. The knowledge graph surfaces entries connected to entities, even when there's no semantic overlap. RRF fuses the three ranked lists. This hybrid consistently outperforms any single signal on recall quality.
Happy to go deep on any of the architectural decisions.
bozbuilds•1h ago
There's just one SQLite file for storage. There's no server, no API, no cloud dependencies. The retrieval mechanism uses a few different systems: FTS5 for keyword matching, sqlite-vec for semantic similarity using nomic-embed-text-v1.5, a QJL two-pass vector compression layer which helps to keep latency manageable for larger databases, and knowledge graph traversal (recursive CTEs) if you want entity-linked results. Those signals are processed via reciprocal ranked fusion, re-ranked, and then give the search results. Everything in the pipeline runs locally.
Benchmarked with LongMemEval, the oracle split gave a recall@3 of 1.000. On LongMemEval S split, recall@10 was 0.955. Median retrieval latency was 22ms/27ms on oracle/s splits respectively, on a laptop with an RTX 4060. The library has an Apache 2.0 license. There's also a 'Pro' tier in development with added optimization features. Let me know what you think!
pip install aingram
GitHub: [https://github.com/bozbuilds/AIngram] | [https://aingram.dev]