frontpage.

I Replaced My AI Agent's Flat Fact Store with a Graph Database

5•grawl_dorgiers•1h ago

# I Replaced My AI Agent's Flat Fact Store with a Graph Database and It Runs in 85MB

I've been building LocalClaw, a local-model-first AI agent framework running on personal hardware through Ollama. No cloud, no API costs. A few weeks ago I posted about the router/specialist architecture. A lot of people asked about the memory system so here's that.

## The Problem

Started with a JSONL fact store and embedding similarity retrieval. Simple enough until it wasn't. After a few weeks of real use I had 14 near-duplicate facts about the same topics from different sessions. Layered dedup on top of dedup and it still wasn't clean.

The bigger problem was relationships. "Peter works at DevMesh" and "DevMesh is building an outreach platform" were two separate embeddings. You could retrieve each one but you couldn't traverse from one to the other. No multi-hop. No fact evolution. Old facts and new facts coexisted with no signal about which was current.

Four iterations on the flat store later I accepted I was patching the wrong thing.

## Why FalkorDB

Looked at Neo4j (Community Edition is intentionally crippled), Memgraph (no native vector search), and FalkorDB.

FalkorDB runs in Docker, uses the Redis wire protocol, has native HNSW vector search, and the entire thing sits at 85MB at my current scale. Graph traversal, vector similarity, and hybrid keyword search in one container. No separate Qdrant, no sync issues between two stores.

## What the Graph Enables

Every fact connects to the entities it references via ABOUT edges. Multi-hop traversal becomes natural - find everything connected to a project, find all entities mentioned alongside a technology.

When a fact changes, the new fact gets a SUPERSEDES edge to the old one. Both persist with timestamps. Temporal queries now work. "What did the system know about this last month?" is a real query.

The vector index runs inside FalkorDB on 4096-dimensional embeddings from qwen3-embedding:8b. O(log n) HNSW search. No external database.

## The Part That Surprised Me

Entity extraction by a small local model is unreliable blind. phi4-mini classified DGX Spark as software and created separate nodes for singular and plural forms of the same entity.

Fix: before extracting entities from a new fact, query existing typed entities from the graph and inject them into the NER prompt as reference context. Now phi4-mini sees "DGX Spark → hardware, FalkorDB → software" before it classifies anything new. Each correctly typed entity makes future extractions more consistent. The graph teaches the model over time without any additional training.

## Scoring

Pure vector similarity surfaces whatever is semantically closest regardless of whether it matters. The scoring formula:

``` score = similarity × 0.5 + recency × 0.2 + importance × 0.3 ```

Importance uses a 1-5 tier (critical health/family = 5, job/identity = 4, preference = 3, context = 2, ephemeral = 1). A moderately relevant but critical fact scores higher than a highly relevant but ephemeral one. Your wife's health condition surfaces above yesterday's weather.

## What I Learned

The model computes nothing. Code handles which facts changed, which are duplicates, what the scores are. The model handles what it means. The moment you let a model do arithmetic or hash-based dedup you get failures you can't explain.

Importance tiers need concrete examples in the extraction prompt. phi4:14b defaulted everything to tier 2 until I added few-shot examples with emotional weight. Abstract instructions don't calibrate a model.

The graph beats flat storage the moment you need relationship reasoning. SUPERSEDES chain alone justified the migration.

Runs entirely on a Mac Mini. 85MB for the graph. Everything local.

GitHub: https://github.com/PeterGreenAppliedAI/LocalClaw

Build 2026: Furthering Windows as the trusted platform for development

Logicomix and 6 months at my new job

Fixing my ridiculous fridge with a tiny Funnel site

Companies Are Using Reddit to Manipulate ChatGPT and Google AI Search

Why are so many Show HNs being flagged?

Show HN: Testbump – automated test driven versioning

Dotnet-slopwatch – detect when AI coding agents "fix" problems by cheating

A victory for digital common sense: Bavaria's deal with Microsoft falls through

SemiAnalysis: TCO of Space Datacenters

CI caching is not one cache

GPS satellites have broadcast a "numbers station" in their public signals

Show HN: Division Swarm, the OS for Multi-Agent Systems

Signal Struggles on Trains Widespread

Qwen 3.7 Plus

Partial Graphics Programs

AgilitySDK 721 Preview and an addition to LinAlg in Shader Model 6.10

I built an API that stops AI hallucinating colour

Show HN: Nib, collaborative font editor on the web

Intel's new CEO cut management layers in half. The stock is up nearly 500%

(PewDiePie) any LOCAL AI model close to Claude Code?

Do we fear the serializable isolation level more than we fear subtle bugs?

Meta enters enterprise AI race with new business agent

Show HN: NoiR Code – because QR sounds similar to "noir"

AI Has Ruined the Job Market

AI to double data centre power and water consumption by 2030, UN researchers say

Running a full blockchain stack locally (2024, 22 minutes) [video]

Majorana 2, made more reliable with Microsoft Discovery agentic AI

Show HN: Extract (YC P25) – Fast, accurate document parsing

Tilt: A toolkit for fixing the pains of microservice development

DaVinci Resolve 21