Show HN: Fixing Claude Code's amnesia with persistent memory

https://github.com/agynio/claude-map-reduce-memory

1•NBenkovich•1h ago

Comments

NBenkovich•1h ago

Every few sessions I'd find myself typing the same thing again. "Auth token expiry is 900s not 3600s." "Don't use duck typing here, the object structure is already defined." After the fifth time I stopped blaming myself for not writing it down and started thinking there has to be a better way.

CLAUDE.md is manual — you maintain it. MEMORY.md is automatic but it only gets read once at session startup, it's scoped per repo so your cross-project preferences don't carry over, and once it gets long enough only the top N lines are read. Older notes just silently disappear from context regardless of how important they are. I had a note about a critical config value fall off the bottom and cost me an hour.

So the core idea: instead of a flat file, use a small LLM as memory. Store notes in the LLM's context window (system prompt), prompt it with what the agent is currently doing, and get back only the hints that are actually relevant right now. Not at startup — right before each tool call.

You write notes like this:

``` cmr-memory write "auth token expiry is 900s not 3600s" --when "working on auth, tokens, config.ts" ```

The `--when` field is a plain-language activation condition. The memory LLM reads the conversation transcript and decides if the condition matches the current context. It's not keyword matching — a note tagged "working on authentication" will surface when the agent is editing JWT middleware even if neither word appears in the recent transcript.

The tricky part is making this fast enough to run on every tool call without being expensive. One LLM call over all your notes doesn't scale — attention degrades in long contexts, costs add up fast, and you'd be waiting 5 seconds before every file read. The solution is splitting notes into small chunks and processing them in parallel with Haiku (fast, cheap, good enough for this job — you don't need Opus to decide if a note is contextually relevant). Each chunk lives in a stable system prompt that gets cached by Anthropic's prompt caching, so after the first call against a chunk it's mostly served from cache.

There's also a deduplication step — once a hint is in the conversation it's already there, so re-injecting it next tool call just bloats context for no reason. The system extracts what's already present and skips anything redundant.

The whole thing wires in via Claude Code's PreToolUse/PostToolUse hooks. PreToolUse fires before every tool call, reads the transcript, runs retrieval, injects hints. PostToolUse sends a short nudge asking if anything worth saving just happened — the agent decides whether to write a note, nothing is forced.

To set it up:

``` npm install -g @agynio/cmr-memory cmr-m

IxInfra•1h ago

This is really interesting, especially the chunking and parallel Haiku approach.

Curious how it holds up as note volume grows. At some point you're still doing N relevance checks per tool call. do you hit a scaling limit there or does chaching keep it manageable? Also wondering if you've seen any drift in relevance when notes become more numerous or overlapping

RFC Esolang – RFCs as executable programs

Austria Blocks Chinese JD.com's Acquisition of MediaMarkt

Chemical Endurance: The Pharmacology of Russian Military Stimulant Programs

Show HN: Memdir – local, file-based memory for AI agents

Keploy – Traffic-Based Automated API Test Generator

What's new in Swift: March 2026 Edition

Meraki MX Security: How Cloud-Managed Firewalls Protect Multi-Site Networks

Show HN: Browserbeam – a browser API built for AI agents

KPMG Faces Allegations of Blown Audit in Private Credit Collapse

1-bit LLMs are here

Unhealthiest Foods on the Planet, According to Science

Show HN: Postgres data cluster by meaning (semantic search and visualization)

Game Pirates Beat Denuvo with Hypervisor Bypasses

NASA plans to send a nuclear-powered spacecraft to Mars in 2028

Why Some Criticisms Matter More Than Others

Karakuri Mechanical Art

Show HN: Tama96 – A virtual pet for your desktop, terminal, or AI agent

Clawbernetes. Infra to deploy agents fast with enterprise grade features

Dial9: A Flight Recorder for Tokio

French Senate votes to block social media access for under-15s

Gest

Why Your AI Agent Shouldn't Define Words

Data centers' heat exhaust is not raising the land temperature around them

Caltech Researchers Claim Compression of High-Fidelity AI Models

Feat: Open-Source Claude Code

Allbirds, Once Valued at $4B, Just Sold Its Assets for Next to Nothing

Show HN: A tool to solve the Agent Supply Chain pandora box

Maze Algorithms

Don't Call It a Moat

A satellite-smashing chain reaction could spiral out of control