We’ve been building [memU](https://github.com/NevaMind-AI/memU), an open-source memory framework for AI agents that supports both classic RAG and LLM-based direct file reading.
RAG has become the default in LLM systems, but many of its failures don’t come from the model — they come from the retrieval assumptions. Embedding-based retrieval is fundamentally an approximation over semantic similarity. It works well for fuzzy recall, but it often breaks when relevance ≠ correctness, which is common in real systems.
From a retrieval perspective, RAG struggles with: - Time- and version-sensitive facts (embeddings don’t encode validity or order) - Structured, canonical knowledge like configs, policies, or agent state - Multi-step reasoning, where incomplete or slightly wrong context compounds errors
In practice, RAG often returns plausible but incorrect context — especially harmful for agents that act over long horizons.
memU takes a different approach.
Instead of trying to make embedding search smarter, we ask: what should not be retrieved via embeddings at all?
Retrieval in memU starts at a Memory Category Layer: - memory is organized into semantically stable categories - each category is stored as a readable Markdown file - these files act as long-term, canonical memory
When a query arrives, the LLM reads the relevant memory files directly, using semantic understanding rather than vector similarity. Only when this layer is insufficient does memU fall back to item-level retrieval, optionally using embeddings for speed.
This design treats the LLM as what it’s increasingly good at: reading, reasoning, and maintaining structured knowledge, not just ranking vectors. Using Markdown files is deliberate — similar to ideas like `skills.md` — making memory explicit, inspectable, and stable over time.
Compared to existing approaches: - [mem0](https://github.com/mem0ai/mem0) is fast and simple with classic RAG, but can struggle with temporal accuracy and precise state changes.
- [Zep](https://github.com/getzep/graphiti) uses graphs, which handle structure well but add complexity and maintenance overhead.
- [memU](https://github.com/NevaMind-AI/memU) uses non-embedding retrieval to address RAG’s structural limits in accuracy, stability, and long-term consistency — without replacing RAG entirely.
For long-running agents, retrieval needs to provide reliable premises for reasoning, not just relevant text. In those settings, direct LLM reading over structured memory often aligns better with how models actually reason.
mikasisiki•1d ago