So I built my own in ~2300 lines of Python. No frameworks, 8 runtime dependencies, 106 tests.
What it does: - Persistent memory via SQLite (FTS5 keyword search + sqlite-vec embeddings + recency decay, fused with Reciprocal Rank Fusion) - MCP tool integration — add capabilities by editing a JSON file - Native tools with safety guardrails (bash blocklist, timeouts, output caps) - Discord interface with session isolation - Structured JSON logging for every operation - Conversation compression for effectively infinite context
Runs locally on 2x RTX 3090 with Qwen3-Coder-Next via llama-server. No cloud APIs.
The design philosophy was: don't build what you don't need, but don't block the insertion points. For example, my AI firewall isn't built yet, but all LLM traffic goes through a single configurable URL, swapping in a filtering proxy is a config change I'll do later.
DESIGN.md documents the reasoning behind every architectural decision. Tests mock the LLM client so you can run them on a laptop.
GitHub: https://github.com/nonatofabio/luna-agent Blog post with full technical deep-dive: https://nonatofabio.github.io/blog/post.html?slug=luna_agent
Happy to answer questions about any of the design tradeoffs.