Breathe-memory takes a different approach: associative injection. Before each LLM call, it extracts anchors from the user's message (entities, temporal references, emotional signals), traverses a concept graph via BFS, runs optional vector search, and injects only what's relevant — typically in <60ms.
When context fills up, instead of summarizing, it extracts a structured graph: topics, decisions, open questions, artifacts. This preserves the semantic structure that summaries destroy.
The whole thing is ~1500 lines of Python, interface-based, zero mandatory deps. Plug in any database, any LLM, any vector store. Reference implementation uses PostgreSQL + pgvector.
https://github.com/tkenaz/breathe-memory
We've been running this in production for several months. Open-sourcing because we think the approach (injection over retrieval) is underexplored and worth more attention.
We've also posted an article about memory injections in a more human-readable form, if you want to see the thinking under the hood: https://medium.com/towards-artificial-intelligence/beyond-ra...
magzter•5h ago