AI coding agents have two memory problems that CLAUDE.md and Rules files don't solve: (1) long conversations get compressed and context silently disappears — the agent forgets decisions made 2 hours ago in the same session, and (2) memory is locked to one tool on one machine. Switch from Claude Code to Cursor, or from your laptop to your desktop, and everything is gone.
I built hmem to fix both. It's an MCP server that gives AI agents persistent, hierarchical memory stored in a local SQLite file. The same .hmem file works across Claude Code, Cursor, Windsurf, OpenCode, and Gemini CLI — on any machine. Your agent's knowledge is portable.
The key idea is borrowed from how human memory works: you remember rough outlines first and recall details on demand. hmem has 5 depth levels. At session start, the agent loads only Level 1 summaries (~20 tokens). It drills deeper into specific memories only when needed — L2 for context, L3-L5 for raw details. Unlike a flat MEMORY.md that gets injected wholesale (3000-8000 tokens every time), hmem loads only what's relevant.
Install: `npx hmem-mcp init` (interactive setup — detects your installed tools and writes the MCP config).
This is beta software. I've been using it in production across two machines with 100+ memory entries and it's been stable, but the API surface may still change. Would appreciate feedback.
GitHub: https://github.com/Bumblebiber/hmem npm: https://www.npmjs.com/package/hmem-mcp License: MIT
Bumblebiber•1h ago
Some background: I run a multi-agent AI system (orchestrator + specialized agents) across multiple machines. Two things kept biting me:
1. *Context dilution:* In long sessions, earlier context gets compressed or dropped. The agent "forgets" decisions made hours ago — not because the session ended, but because the context window silently pushed them out.
2. *Vendor/machine lock-in:* I switch between Claude Code, Gemini CLI, and OpenCode depending on the task. And I work on two PCs. CLAUDE.md only works in Claude Code, on one machine. There was no way to carry knowledge across tools or devices.
hmem solves both: it's a single .hmem file (SQLite) that any MCP-compatible tool can read/write. Same memory, any tool, any machine.
The lazy loading is key — the agent never reads the database directly. It makes tool calls and gets back only what it asked for. A typical session start costs ~20 tokens for the L1 overview. Drilling into one specific topic costs ~80 tokens. Compare that to a MEMORY.md that injects 3000-8000 tokens wholesale every time.
Technical details: - SQLite backend (better-sqlite3), one .hmem file per agent - 5-level hierarchy with dot-path addressing (e.g., L0003.2.1) - Entries are auto-timestamped, support time-range queries - Full-text search across all levels - Configurable category prefixes (defaults: P=Project, L=Lesson, E=Error, D=Decision, M=Milestone, S=Skill, F=Favorite) - Favorites are always loaded at depth 2 (pinned context) - Includes integrity checks — auto-backup on corruption detection
What's next: - Cloud sync between machines (encrypted, probably git-based) - Memory forks — think GitHub repos but for agent memories (fork a curated react-patterns.hmem as a starting point) - Better onboarding docs and a demo video
This is a genuine beta — I use it daily but it hasn't been battle-tested by others yet. If you try it, I'd love to hear what breaks or what's confusing about the setup.
beeman•45m ago