Not the kind where they forget things. The kind where they remember everything, all at once, every time.
Every message. Full history. Appended to every prompt. The context window fills. Costs compound. And you're paying to re-sendthe same conversation on every single turn.
I wanted to fix that.
---
The insight came from thinking about how people actually remember things.
You don't replay your entire life to answer a question. You retrieve what's relevant - usually just a few facts. The detail only comes back when you need it.
So I built HAM. Hierarchical Adaptive Memory.
Memory lives in four tiers:
L0 → 8 tokens - topic slug, always in context
L1 → 35 tokens - key facts, retrieved on topic match
L2 → 150 tokens - full summary, retrieved when needed
L3 → 500+ tokens - raw detail, only on deep queries
The retriever scores each incoming message against stored topics. It pulls only the tier it needs. Nothing more.A follow-up question on a topic you discussed yesterday costs 35 tokens of memory context. The naive approach costs 1,890.
---
I benchmarked this across 5 topics, 8 questions each.
Naive: 6,825 tokens average
HAM: 1,205 tokens average
Reduction: 82.3%
Anyone can reproduce it: npm run benchmark - in-memory SQLite, same questions, same seed every run.---
L4 wasn't in the original design.
When the agent hits a question it has no memory for, it answers using the LLM. Then it asks itself: is this response worth keeping?
Criteria: longer than 180 characters, not conversational filler. If yes - it compresses the response into the tier structure and saves it.
The agent learns from what it didn't know.
Ask it something obscure. It answers. It saves. Ask again tomorrow - it already knows.
---
What I haven't fully solved:
Compression quality at L1 and L2 depends on the summarization model. Gemini Flash is fast and cheap. But on highly technical content, it sometimes loses edge-case detail in the summary. You trade token cost for precision.
Lossy compression in memory systems feels like an unsolved problem. I'd be curious if anyone has dealt with this at scale.
GitHub: https://github.com/ajstars1/agent-os MIT. SQLite + WAL. TypeScript strict. Runs local.