I built this to solve a specific frustration Claude Code has no memory between sessions. Every conversation starts from scratch, which means constantly re-explaining project context, architecture decisions, and previous work.
The approach
Instead of traditional RAG (retrieve document chunks by keyword), the Memory System lets Claude itself decide what’s worth remembering. At session end, it analyzes the transcript and extracts semantic memories with metadata — importance weights, trigger phrases, context types.
On the next session, relevant memories are retrieved via vector similarity + metadata scoring and injected into the context. The AI curates for the AI.
Technical details
FastAPI server running locally (port 8765)
ChromaDB for vector storage, SQLite for metadata
Sentence transformers for embeddings (MiniLM-L6)
Claude Code integration via hooks (SessionStart, UserPromptSubmit, SessionEnd)
Two curation methods session-based (–resume flag) or transcript-based (Claude Agent SDK)
What’s interesting architecturally
The system is CLI-agnostic. Any LLM client that can (1) intercept messages before send and (2) provide session transcripts can integrate. Claude Code is the first implementation, but the memory engine doesn’t care which client is calling it.
Setup is four commands. Everything runs locally — no data leaves your machine.
Curious to hear thoughts on the approach, especially from anyone who’s tried solving the context persistence problem differently.
RustyNail96•44m ago
I built this to solve a specific frustration Claude Code has no memory between sessions. Every conversation starts from scratch, which means constantly re-explaining project context, architecture decisions, and previous work.
The approach
Instead of traditional RAG (retrieve document chunks by keyword), the Memory System lets Claude itself decide what’s worth remembering. At session end, it analyzes the transcript and extracts semantic memories with metadata — importance weights, trigger phrases, context types.
On the next session, relevant memories are retrieved via vector similarity + metadata scoring and injected into the context. The AI curates for the AI.
Technical details
FastAPI server running locally (port 8765) ChromaDB for vector storage, SQLite for metadata Sentence transformers for embeddings (MiniLM-L6) Claude Code integration via hooks (SessionStart, UserPromptSubmit, SessionEnd) Two curation methods session-based (–resume flag) or transcript-based (Claude Agent SDK) What’s interesting architecturally
The system is CLI-agnostic. Any LLM client that can (1) intercept messages before send and (2) provide session transcripts can integrate. Claude Code is the first implementation, but the memory engine doesn’t care which client is calling it.
Setup is four commands. Everything runs locally — no data leaves your machine.
Curious to hear thoughts on the approach, especially from anyone who’s tried solving the context persistence problem differently.