Show HN: A memory database that forgets, consolidates, and detects contradiction

https://github.com/yantrikos/yantrikdb-server

27•pranabsarkar•4h ago

Vector databases store memories. They don't manage them. After 10k memories, recall quality degrades because there's no consolidation, no forgetting, no conflict resolution. Your AI agent just gets noisier.

YantrikDB is a cognitive memory engine — embed it, run it as a server, or connect via MCP. It thinks about what it stores: consolidation collapses duplicate memories, contradiction detection flags incompatible facts, temporal decay with configurable half-life lets unimportant memories fade like human memory does.

Single Rust binary. HTTP + binary wire protocol. 2-voter + 1-witness HA cluster via Docker Compose or Kubernetes. Chaos-tested failover, runtime deadlock detection (parking_lot), per-tenant quotas, Prometheus metrics. Ran a 42-task hardening sprint last week — 1178 core tests, cargo-fuzz targets, CRDT property tests, 5 ops runbooks.

Live on a 3-node Proxmox homelab cluster with multiple tenants. Alpha — primary user is me, looking for the second one.

Comments

pranabsarkar•4h ago

Author here. I built this because I was using ChromaDB for an AI agent's memory and recall quality went to garbage at ~5k memories. The agent kept recalling outdated facts, contradicting itself across sessions, and the context window was full of redundant near-duplicates.

I tried to write the consolidation/conflict-detection logic on top of ChromaDB. It didn't work — the operations need to be transactional with the vector index, and they need an HLC for ordering across nodes. So I built it as a database.

The cognitive operations (think, consolidate, detect_conflicts, derive_personality) are the actual differentiator. The clustered server is what made me confident enough to ship — I needed to know the data was safe before I'd put real work on it.

What I genuinely want to know: is this solving a problem you're hitting with your AI agent's memory, or did I build a really polished thing for my own narrow use case? Honest reactions help more than encouragement.

all2•1h ago

I've bookmarked this. I'll let you know what I find over the next few weeks.

I'm in the middle of building an agent harness and I haven't had to deal with long-running memory issues yet, but I will have to deal with it soon.

pranabsarkar•1h ago

Thanks, really appreciate it. I am using the server as MCP server and connected all my workspaces. It has definitely changed my experience.

polotics•2h ago

In this day and age, without serious evidence that the software presented has seen some real usage, or at least has a good reviewable regression test suite, sadly the assumption may be that this is a slopcoded brainwave. The ascii-diagram doesn't help. Also maybe explain the design more.

6r17•1h ago

I kind of agree with the comment here that a lot of stuff happening around comes out from an idea without proof that the project has a meaningful result. A compacting memory bench is not something difficult to put off but I'm also having difficulties understanding what would be the outcome on a running system

pranabsarkar•1h ago

Fair. "Does consolidation actually improve recall quality on a running system?" is exactly the benchmark I haven't published, and it's the one that would settle the question.

What I do have right now:

1178 core unit tests including CRDT convergence property tests via proptest (for any sequence of ops, final state is order-independent) Chaos test harness: Docker'd 3-node cluster with leader-kill / network-partition / kill-9 scenarios (tests/chaos/ in the repo) cargo-fuzz targets against the wire protocol and oplog deserializer Live usage: running on my 3-node homelab cluster with two real tenants (small — a TV-writing agent and another experiment) for the past few weeks. Caught a real production self-deadlock during this period (v0.5.8), which is what triggered the 42-task hardening sprint. What I don't have and should: a recall-quality-over-time benchmark. Something like: seed 5,000 memories with known redundancy and contradictions, measure recall precision@10 before and after think(), and publish the curve. That's the evidence you're asking for, and you're right it's missing. I'll run that and post the numbers in a follow-up.

The ASCII diagram fair point too — website has proper rendering (yantrikdb.com) but the README should have an SVG.

Appreciate the pushback — this is more useful than encouragement.

altmanaltman•1h ago

Did you check if this leads to any actual benefits? If so, how did you benchmark it?

pranabsarkar•49m ago

Update — ran a real bench on the live cluster (59 memories: 8 canonical facts × 3-4 paraphrases + 6 seeded contradictions + 20 distractors). Numbers:

duplicates per query (top-10): 0.9 → 0.0 top-result correct: 75% → 87.5% 11 consolidations in 80ms conflicts detected: 0 of 6 seeded ← this one matters Turns out conflict detection runs on graph edges, and /v1/remember doesn't auto-extract entities — so contradictions sit there invisibly until you explicitly call relate. That's a UX gap, not a missing feature, but it breaks the "drop memories in, get contradictions out" mental model. Filed as issues #1 and #2. Dataset + script + raw results: https://gist.github.com/spranab/49c618d3625dc131308227103af5.... Honest benches surface the kind of thing demos hide; thanks for pushing.

Mithriil•1h ago

The half-life idea is interesting.

What's the loop behind consolidation? Random sampling and LLM to merge?

pranabsarkar•1h ago

No LLM in the loop. The consolidation pass is deterministic:

Pull the N most recent active memories (default 30) with embeddings Pairwise cosine similarity, threshold 0.85 For each similar pair, check if they share extracted entities Shared entities + similarity 0.85-0.98 → flag as potential contradiction (same topic, maybe different facts) No shared entities + similarity > 0.85 → redundancy (mark for consolidation) Second pass at 0.65 threshold specifically for substitution-category pairs (e.g., "MySQL" vs "PostgreSQL" in otherwise-similar sentences) — these are usually real contradictions even at lower similarity Consolidation then collapses the redundancy set into canonical memories with combined importance/certainty. No LLM call, no randomness. Reproducible, cheap, runs in a background tick every ~5 minutes.

The LLM could improve this (better merge decisions, better entity alignment) but the tradeoff is cost and non-determinism. v1 is deterministic on purpose.

Source: crates/yantrikdb-core/src/cognition/triggers.rs and consolidate.rs next to it.

tcdent•1h ago

I appreciate the effort you put into mapping semantics so language constructs can be incorporated into this. You’re probably already seeing that the amount of terminology, how those terms interact with each other, and the way you need to model it have ballooned into a fairly complex system.

The fundamental breakthrough with LLMs is that they handle semantic mapping for you and can (albeit non-deterministically) interpret the meaning and relationships between concepts with a pretty high degree of accuracy, in context.

It just makes me wonder if you could dramatically simplify the schema and data modeling by incorporating more of these learnings.

I have a simple experiment along these lines that’s especially relevant given the advent of one-million-token context windows, although I don’t consider it a scientifically backed or production-ready concept, just an exploration: https://github.com/tcdent/wvf

pranabsarkar•1h ago

Thanks for the careful read — the "schema is ballooning" observation is real and I've felt it building this. You're pointing at a genuine design tension.

My counter, qualified: deterministic consolidation is cheap and reproducible in a way LLM-in-the-loop consolidation isn't, at least today. Every think() invocation is free (cosine + entity matching + SQL). If I put an LLM in the loop the cost is O(N²) LLM calls per consolidation pass — for a 10k-memory database, that's thousands of dollars of inference per tick. So for v1 I'm trading off "better merge decisions" against "actually runs every 5 minutes without burning a budget."

On 1M-context-windows: I think they push the "vector DB break point" out but don't remove it. Context stuffing still has recall-precision problems at scale (lost-in-the-middle, attention dilution on unrelated facts), and 1M tokens ≠ unbounded memory. At 10M memories no context window saves you.

wvf is interesting — just read through. The "append everything, let the model retrieve" approach is the complement of what I'm doing: you lean fully into LLM semantics, I try to do the lookup deterministically. Probably both are right for different workloads. Yours wins when you have unbounded compute + a small corpus; mine wins when you have bounded compute + a large corpus that needs grooming.

Starring wvf now. Curious if you're seeing meaningful quality differences between your approach and traditional retrieval at scale.

tcdent•43m ago

Appreciate the thoughtful reply.

Absolutely agree the deterministic performance-oriented mindset is still essential for large workloads. Are you expecting that this supplements a traditional vector/semantic store or that it superceeds it?

My focus has absolutely been on relatively small corpii, and which is supported by forcing a subset of data to be included by design. There are intentionally no conventions for things like "we talked about how AI is transforming computing at 1AM" and instead it attempts to focus on "user believes AI is transforming computing", so hopefully there's less of the context poisoning that happens with current memory.

Haven't deployed WVF at any scale yet; just a casual experiment among many others.

hazelnut•1h ago

Congrats, looking promising. How does it compare to supermemory.ai?

pranabsarkar•1h ago

Fair question. Supermemory is a hosted SaaS built around embedding + ranking. YantrikDB is self-hosted and adds three things Supermemory doesn't do as first-class operations:

think() — consolidates similar memories into canonical ones (not just deduplication, actual collapse of redundant facts) Contradiction detection — when "CEO is Alice" and "CEO is Bob" both exist in memory, it flags the pair as a conflict the agent can resolve Temporal decay with configurable half-life — memories fade, so old unimportant stuff stops polluting recall Supermemory does more on the cloud side (team sharing, permissions, integrations). YantrikDB does more on the "actively manage my agent's memory" side. Different optimization points — no dig at Supermemory.

endymi0n•1h ago

I've experimented quite a bit with mem0 (which is similar in design) for my OpenClaw and stopped using it very soon. My impression is that "facts" are an incredibly dull and far too rigid tool for any actual job at hand and for me were a step back instead of forward in daily use. In the end, the extracted "facts database" was a complete mess of largely incomplete, invalid, inefficient and unhelpful sentences that didn't help any of my conversations, and after the third injected wrong fact I went back to QMD and prose / summarization. Sometimes it's slightly worse at updating stuck facts, but I'll take a 1000% better big picture and usefulness over working with "facts".

The failure modes were multiple: - Facts rarely exist in a vacuum but have lots of subtlety - Inferring facts from conversation has a gazillion failure modes, especially irony and sarcasm lead to hilarious outcomes (joking about a sixpack with a fat buddy -> "XYZ is interested in achieving an athletic form"), but even things as simple as extracting a concrete date too often go wrong - Facts are almost never as binary as they seem. "ABC has the flights booked for the Paris trip". Now I decided afterwards to continue to New York to visit a friend instead of going home and completely stumped the agent.

pranabsarkar•23m ago

Fair criticism — and the failure modes you describe aren't mem0-specific, they hit any system that extracts atomic facts from conversation. I hit a couple of them today while benchmarking YantrikDB's own consolidation (see my reply to polotics): "Alice is CEO" got merged with "Sarah is CTO" on cosine similarity alone because the sentences share too much structural scaffolding. That's exactly the "facts in a vacuum" problem you're naming.

Two small clarifications:

remember(text, importance, domain) takes a free-form string — nothing forces atomic facts. A QMD-style prose block, a procedure, a dated plan, all work. The irony/sarcasm-inverts-the-fact failure mode lives in the agent's extraction layer, not the backend. So "write narrative into it, recall narrative out" is a legitimate usage pattern; the DB is agnostic.

YantrikDB's actual differentiator vs mem0 is temporal decay + consolidation + conflict detection, not smarter fact extraction. The "ABC has the Paris flight booked → actually I'm going to NYC" problem is meant to be addressed by decay (the old fact fades) and contradiction flagging (the new one triggers a conflict for the agent to resolve). But — honest read — my bench today showed conflict detection needs work to actually fire on raw text. Filed as issues #1 and #2, fixing now.

Broader point stands though: if the agent is producing brittle inferred facts upstream, no memory backend saves it. The DB can manage rot and contradiction. It can't fix bad inference. For what it's worth, I mostly use it for durable role context ("user is a data scientist on observability") rather than event lifecycle ("Paris flight booked") — the latter is what prose summarization is genuinely better at, and I think you're right that mem0-style auto-extraction applied to lifecycle events is a bad shape.

Show HN: LangAlpha – what if Claude Code was built for Wall Street?

Show HN: Plain – The full-stack Python framework designed for humans and agents

Show HN: A memory database that forgets, consolidates, and detects contradiction

Show HN: Kontext CLI – Credential broker for AI coding agents in Go

Show HN: Kelet – Root Cause Analysis agent for your LLM apps

Show HN: Send physical postcards from your coding harness

Show HN: Sk.illmd.com, a forum for talking about and showing off agent skills

Show HN: Run GUIs as Scripts

Show HN: A Claude Code–driven tutor for learning algorithms in Go

Show HN: Ithihāsas – a character explorer for Hindu epics, built in a few hours

Show HN: AriaType – open-source privacy-first and local-first voice-to-text app

Show HN: We built an MCP for Windows – ask Claude about CPU, temps, and privacy

Show HN: A stateful UI runtime for reactive web apps in Go

Show HN: VibeDrift – Measure drift in AI-generated codebases

Show HN: boringBar – a taskbar-style dock replacement for macOS

Show HN: Continual Learning with .md

Show HN: Pushduck – S3 uploads that run on Cloudflare Workers, no AWS SDK

Show HN: Deflect One – command line dashboard for managing Linux servers via SSH

Show HN: Oberon System 3 runs natively on Raspberry Pi 3 (with ready SD card)

Show HN: Mcptube – Karpathy's LLM Wiki idea applied to YouTube videos

Show HN: A CLI that writes its own integration code

Show HN: Pardonned.com – A searchable database of US Pardons

Show HN: A Bomberman-style 1v1 game where LLMs compete in real time

Show HN: Claudraband – Claude Code for the Power User

Show HN: I built a social media management tool in 3 weeks with Claude and Codex

Show HN: Tsplat – Render Gaussian Splats directly in your terminal

Show HN: Write better Go integration tests with open source dockertest v4

Show HN: Equirect – a Rust VR video player

Show HN: FluidCAD – Parametric CAD with JavaScript

Show HN: Lythonic – Compose Python functions into data-flow pipelines

Show HN: A memory database that forgets, consolidates, and detects contradiction

Comments

Show HN: LangAlpha – what if Claude Code was built for Wall Street?

Show HN: Plain – The full-stack Python framework designed for humans and agents

Show HN: A memory database that forgets, consolidates, and detects contradiction

Show HN: Kontext CLI – Credential broker for AI coding agents in Go

Show HN: Kelet – Root Cause Analysis agent for your LLM apps

Show HN: Send physical postcards from your coding harness

Show HN: Sk.illmd.com, a forum for talking about and showing off agent skills

Show HN: Run GUIs as Scripts

Show HN: A Claude Code–driven tutor for learning algorithms in Go

Show HN: Ithihāsas – a character explorer for Hindu epics, built in a few hours

Show HN: AriaType – open-source privacy-first and local-first voice-to-text app

Show HN: We built an MCP for Windows – ask Claude about CPU, temps, and privacy

Show HN: A stateful UI runtime for reactive web apps in Go

Show HN: VibeDrift – Measure drift in AI-generated codebases

Show HN: boringBar – a taskbar-style dock replacement for macOS

Show HN: Continual Learning with .md

Show HN: Pushduck – S3 uploads that run on Cloudflare Workers, no AWS SDK

Show HN: Deflect One – command line dashboard for managing Linux servers via SSH

Show HN: Oberon System 3 runs natively on Raspberry Pi 3 (with ready SD card)

Show HN: Mcptube – Karpathy's LLM Wiki idea applied to YouTube videos

Show HN: A CLI that writes its own integration code

Show HN: Pardonned.com – A searchable database of US Pardons

Show HN: A Bomberman-style 1v1 game where LLMs compete in real time

Show HN: Claudraband – Claude Code for the Power User

Show HN: I built a social media management tool in 3 weeks with Claude and Codex

Show HN: Tsplat – Render Gaussian Splats directly in your terminal

Show HN: Write better Go integration tests with open source dockertest v4

Show HN: Equirect – a Rust VR video player

Show HN: FluidCAD – Parametric CAD with JavaScript

Show HN: Lythonic – Compose Python functions into data-flow pipelines