I've been building LocalClaw, a local-model-first AI agent framework running on personal hardware through Ollama. No cloud, no API costs. A few weeks ago I posted about the router/specialist architecture. A lot of people asked about the memory system so here's that.
## The Problem
Started with a JSONL fact store and embedding similarity retrieval. Simple enough until it wasn't. After a few weeks of real use I had 14 near-duplicate facts about the same topics from different sessions. Layered dedup on top of dedup and it still wasn't clean.
The bigger problem was relationships. "Peter works at DevMesh" and "DevMesh is building an outreach platform" were two separate embeddings. You could retrieve each one but you couldn't traverse from one to the other. No multi-hop. No fact evolution. Old facts and new facts coexisted with no signal about which was current.
Four iterations on the flat store later I accepted I was patching the wrong thing.
## Why FalkorDB
Looked at Neo4j (Community Edition is intentionally crippled), Memgraph (no native vector search), and FalkorDB.
FalkorDB runs in Docker, uses the Redis wire protocol, has native HNSW vector search, and the entire thing sits at 85MB at my current scale. Graph traversal, vector similarity, and hybrid keyword search in one container. No separate Qdrant, no sync issues between two stores.
## What the Graph Enables
Every fact connects to the entities it references via ABOUT edges. Multi-hop traversal becomes natural - find everything connected to a project, find all entities mentioned alongside a technology.
When a fact changes, the new fact gets a SUPERSEDES edge to the old one. Both persist with timestamps. Temporal queries now work. "What did the system know about this last month?" is a real query.
The vector index runs inside FalkorDB on 4096-dimensional embeddings from qwen3-embedding:8b. O(log n) HNSW search. No external database.
## The Part That Surprised Me
Entity extraction by a small local model is unreliable blind. phi4-mini classified DGX Spark as software and created separate nodes for singular and plural forms of the same entity.
Fix: before extracting entities from a new fact, query existing typed entities from the graph and inject them into the NER prompt as reference context. Now phi4-mini sees "DGX Spark → hardware, FalkorDB → software" before it classifies anything new. Each correctly typed entity makes future extractions more consistent. The graph teaches the model over time without any additional training.
## Scoring
Pure vector similarity surfaces whatever is semantically closest regardless of whether it matters. The scoring formula:
``` score = similarity × 0.5 + recency × 0.2 + importance × 0.3 ```
Importance uses a 1-5 tier (critical health/family = 5, job/identity = 4, preference = 3, context = 2, ephemeral = 1). A moderately relevant but critical fact scores higher than a highly relevant but ephemeral one. Your wife's health condition surfaces above yesterday's weather.
## What I Learned
The model computes nothing. Code handles which facts changed, which are duplicates, what the scores are. The model handles what it means. The moment you let a model do arithmetic or hash-based dedup you get failures you can't explain.
Importance tiers need concrete examples in the extraction prompt. phi4:14b defaulted everything to tier 2 until I added few-shot examples with emotional weight. Abstract instructions don't calibrate a model.
The graph beats flat storage the moment you need relationship reasoning. SUPERSEDES chain alone justified the migration.
Runs entirely on a Mac Mini. 85MB for the graph. Everything local.
GitHub: https://github.com/PeterGreenAppliedAI/LocalClaw