Claude doesn't remember the product roadmap we outlined last week. It doesn't know the design decisions we already made. It forgot the feature spec we iterated on across three sessions. I kept re-explaining the same things.
I looked at existing memory solutions but never got past the door. Mem0 wants Docker + Postgres + Qdrant. I just want memory, not infrastructure. mcp-memory-service has 12 tools, which is just complexity tax on every LLM call. And anything cloud-hosted means my codebase context leaves my machine. The setup cost was always too high and privacy never guaranteed, so I stuck with CLAUDE.md files. They work for a handful of preferences, but it's a flat file injected into context every time. No semantic search, no cross-project memory, no decay, no dedup. It doesn't scale.
So I built Sediment. The entire API is 4 tools: store, recall, list, forget.
I deliberately kept it small. I tried adding tags, metadata, expiration dates. Every parameter I added made the LLM worse at using it. With just store content, it just works. The assistant stores things naturally when they seem worth remembering and recalls them when context would help.
It's made a noticeable difference. My assistant remembers product ideas I brainstormed at 2am, the coding guidelines for each project, feature specs we refined over multiple sessions, and the roadmap priorities I set last month. It remembers across projects too.
I benchmarked it against 5 alternatives to make sure I wasn't fooling myself. 1,000 memories, 200 queries. Sediment returns the correct top result 50% of the time (vs 47% for the next best). When I update a memory, it always returns the latest version. Competitors get this right only 14% of the time. And it's the only system that auto-deduplicates (99% consolidation rate).
Everything runs locally. Single Rust binary, no Docker, no cloud, no API keys.
A few things I expect pushback on:
"4 tools is too few." I tested 8, 12, and more. Every parameter is a decision the LLM makes on every call. Tags alone create a combinatorial explosion. Semantic search handles categorization better because it doesn't require consistent manual labeling.
"all-MiniLM-L6-v2 is outdated." I benchmarked 4 models including bge-base-en-v1.5 (768-dim) and e5-small-v2. MiniLM tied with bge-base on quality but runs 2x faster. The model matters less than you'd think when you layer memory decay, graph expansion, and hybrid BM25 scoring on top.
"Mem0 supports temporal reasoning too." Mem0's graph variant handles conflicts via LLM-based resolution (ADD/UPDATE/DELETE) on each store, which requires an LLM call on every write. Their benchmarks use LOCOMO, a conversational memory dataset that tests a different use case than developer memory retrieval. The bigger issue is that there's no vendor-neutral, open benchmark for comparing memory systems. Every project runs their own evaluation on their own dataset. That's why I open-sourced the full benchmark suite: same dataset, same queries, reproducible by anyone. I'd love to see other tools run it too.
Benchmark methodology: 1,000 developer memories across 6 categories, 200 ground-truth queries, 50 temporal update sequences, 50 dedup pairs.
Landing page: https://sediment.sh
GitHub: https://github.com/rendro/sediment
Benchmark suite: https://github.com/rendro/sediment-benchmark