For the last 6 months I've been running a personal Claude agent as a systemd service on a Beelink mini PC. It gives me a morning briefing in Discord, monitors my email, tracks my finances, and actually remembers things across conversations — not just the last session. Here's how the memory system works and why the naive approach falls apart.
The memory problem
Every agent tutorial I found did one of two things: forgot everything on restart, or shoved the full conversation history into every prompt until it hit the context limit and broke. Neither is usable long-term.
What actually works is three tiers:
- Core Memory — permanent facts about you, always in the system prompt, kept small (~500 chars). Who you are, current tasks, standing preferences.
- Recall — every conversation logged to SQLite, FTS5 searchable on demand. The agent can look back, but it doesn't have to carry it all.
- Archival — long-term knowledge store, same FTS5 search. Stuff like "user doesn't like meetings before 10am" or "the Plaid API returns dates in UTC not local time."
The key insight: Core Memory is always in context. Everything else is searched when needed. After 6 months of daily use the system prompt is still ~2KB.
The MCP subprocess thing
The Claude Agent SDK lets you expose tools as MCP servers. The natural approach is an in-process bidirectional channel — but there's a race condition on it (issue #148 in the SDK repo) that causes intermittent crashes under load.
The fix: run the memory MCP server as a standalone stdio JSON-RPC subprocess. The parent spawns it at startup and communicates over stdin/stdout. It's a few extra lines but eliminates the crash entirely. The skeleton in the repo does it this way.
Loop detection
Tool call loops are a real problem. Agent tries a bash command, it fails, agent tries the exact same command, it fails again, repeat until you've burned $3 and gotten nothing. The fix is trivial once you know to do it: track the last N tool call names in an array. If the last 3 are identical, call q.interrupt() and bail with an explanation.
What's in the repo
The GitHub repo is a working skeleton — the core loop, SQLite memory system, and a terminal REPL. npm install && npm start and you're talking to an agent that already has persistent memory. It type-checks clean on Node 22.
What it doesn't have: Discord integration, cron jobs, the systemd service setup, semantic memory search, or the pitfalls I hit over 6 months. I wrote all of that up in a guide — link in the README if you want the full thing, but the skeleton runs standalone.
Cost
Claude Max subscription is $20/month flat with no per-token charges. Server is a Beelink mini PC pulling ~15W — call it $2–3/month in electricity. Total: ~$25/month for something that never sleeps, has real tool access, and actually knows who you are.
cha0tikdino•1h ago
The memory problem
Every agent tutorial I found did one of two things: forgot everything on restart, or shoved the full conversation history into every prompt until it hit the context limit and broke. Neither is usable long-term.
What actually works is three tiers:
- Core Memory — permanent facts about you, always in the system prompt, kept small (~500 chars). Who you are, current tasks, standing preferences. - Recall — every conversation logged to SQLite, FTS5 searchable on demand. The agent can look back, but it doesn't have to carry it all. - Archival — long-term knowledge store, same FTS5 search. Stuff like "user doesn't like meetings before 10am" or "the Plaid API returns dates in UTC not local time."
The key insight: Core Memory is always in context. Everything else is searched when needed. After 6 months of daily use the system prompt is still ~2KB.
The MCP subprocess thing
The Claude Agent SDK lets you expose tools as MCP servers. The natural approach is an in-process bidirectional channel — but there's a race condition on it (issue #148 in the SDK repo) that causes intermittent crashes under load.
The fix: run the memory MCP server as a standalone stdio JSON-RPC subprocess. The parent spawns it at startup and communicates over stdin/stdout. It's a few extra lines but eliminates the crash entirely. The skeleton in the repo does it this way.
Loop detection
Tool call loops are a real problem. Agent tries a bash command, it fails, agent tries the exact same command, it fails again, repeat until you've burned $3 and gotten nothing. The fix is trivial once you know to do it: track the last N tool call names in an array. If the last 3 are identical, call q.interrupt() and bail with an explanation.
What's in the repo
The GitHub repo is a working skeleton — the core loop, SQLite memory system, and a terminal REPL. npm install && npm start and you're talking to an agent that already has persistent memory. It type-checks clean on Node 22.
What it doesn't have: Discord integration, cron jobs, the systemd service setup, semantic memory search, or the pitfalls I hit over 6 months. I wrote all of that up in a guide — link in the README if you want the full thing, but the skeleton runs standalone.
Cost
Claude Max subscription is $20/month flat with no per-token charges. Server is a Beelink mini PC pulling ~15W — call it $2–3/month in electricity. Total: ~$25/month for something that never sleeps, has real tool access, and actually knows who you are.
Happy to answer questions about the architecture.