frontpage.

LLMs forget. The standard fix is RAG — retrieve chunks, stuff them in. It works until it doesn't: irrelevant chunks waste tokens, summaries lose structure, and nothing actually models how memory works.

Breathe-memory takes a different approach: associative injection. Before each LLM call, it extracts anchors from the user's message (entities, temporal references, emotional signals), traverses a concept graph via BFS, runs optional vector search, and injects only what's relevant — typically in <60ms.

When context fills up, instead of summarizing, it extracts a structured graph: topics, decisions, open questions, artifacts. This preserves the semantic structure that summaries destroy.

The whole thing is ~1500 lines of Python, interface-based, zero mandatory deps. Plug in any database, any LLM, any vector store. Reference implementation uses PostgreSQL + pgvector.

https://github.com/tkenaz/breathe-memory

We've been running this in production for several months. Open-sourcing because we think the approach (injection over retrieval) is underexplored and worth more attention.

We've also posted an article about memory injections in a more human-readable form, if you want to see the thinking under the hood: https://medium.com/towards-artificial-intelligence/beyond-ra...

fp.

Show HN: Breathe-Memory – Associative memory injection for LLMs (not RAG)

Comments