frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Breathe-Memory – Associative memory injection for LLMs (not RAG)

https://github.com/tkenaz/breathe-memory
5•mvyshnyvetska•6h ago
LLMs forget. The standard fix is RAG — retrieve chunks, stuff them in. It works until it doesn't: irrelevant chunks waste tokens, summaries lose structure, and nothing actually models how memory works.

Breathe-memory takes a different approach: associative injection. Before each LLM call, it extracts anchors from the user's message (entities, temporal references, emotional signals), traverses a concept graph via BFS, runs optional vector search, and injects only what's relevant — typically in <60ms.

When context fills up, instead of summarizing, it extracts a structured graph: topics, decisions, open questions, artifacts. This preserves the semantic structure that summaries destroy.

The whole thing is ~1500 lines of Python, interface-based, zero mandatory deps. Plug in any database, any LLM, any vector store. Reference implementation uses PostgreSQL + pgvector.

https://github.com/tkenaz/breathe-memory

We've been running this in production for several months. Open-sourcing because we think the approach (injection over retrieval) is underexplored and worth more attention.

We've also posted an article about memory injections in a more human-readable form, if you want to see the thinking under the hood: https://medium.com/towards-artificial-intelligence/beyond-ra...

Comments

magzter•5h ago
This looks interesting, I enjoyed the explanation of how RAG works vs this, found it easy to follow. Would like to try this in some projects or claw assistants to see if there's any meaningful improvement in context handling.