Show HN: Unified multimodal memory framework, without embeddings

7•k_kiki•1mo ago

Hi HN,

We’ve been building memU(https://github.com/NevaMind-AI/memU), an open-source, general-purpose memory framework for AI agents. It supports dual-mode retrieval: classic RAG and LLM-based direct file reading.

Most multimodal memory systems either embed everything into vectors or treat non-text data as attachments. These work, but at scale it becomes hard to explain why certain context was retrieved and what evidence it relies on.

memU takes a different approach: since models reason in language, multimodal memory should converge into structured, queryable text, while remaining fully traceable to original data.

---

## Three-Layer Architecture

- Resource Layer Stores raw multimodal data as ground truth. All higher-level memory remains traceable to this layer.

- Memory Item Layer Extracts atomic facts from raw data and stores them as natural-language statements. Embeddings are optional and used only for acceleration.

- Memory Category Layer Aggregates items into readable, theme-based memory files (e.g. user preferences, work logs). Frequently accessed topics stay active; low-usage content is demoted to balance speed and coverage.

---

## Memorization Bottom-up and asynchronous. Data flows from resources → items → category files without manual schemas. When capacity is reached, recently relevant memories replace the least used ones.

## Retrieval Top-down. memU searches category files first, then items, and only falls back to raw data if needed. At the item layer, it combines BM25 + embeddings to balance exact matching and semantic recall, avoiding embedding-only imprecision.

Dual-mode retrieval lets applications choose between: - low-latency embedding search, or - LLM-based direct reading of memory files.

## Evolution Memory structure adapts automatically based on real usage: - Frequently accessed memories remain at the Category layer - Memories retrieved from raw data are promoted upward and linked - Organization evolves from usage patterns, not predefined rules

Goal: keep relevant memories retrievable at the Category layer and minimize latency over time.

---

## A Unified Multimodal Memory Pipeline memU is a text-centered multimodal memory system. Multimodal inputs are progressively converted into interpretable text memory, while staying traceable to original data. This provides stable, high-level context for reasoning, with detailed evidence available when needed—inside a memory structure that evolves through real-world use.

Comments

Junnn•1mo ago

From an engineering perspective, what I find compelling here is not “no embeddings”, but the decision to treat memory as a first-class, inspectable system rather than a retrieval trick.

Most agent memory stacks today collapse everything into embeddings and hope similarity search is enough. That works for recall, but breaks down quickly when you need traceability, temporal reasoning, or explanation of why something was remembered.

The layered design here (raw resources → extracted memory items → categorized memory files) feels much closer to how we design real systems: separation of concerns, clear abstraction boundaries, and the ability to reason about state changes over time.

Storing memories in human-readable form also makes debugging and evolution practical. You can audit what the agent “knows”, adjust policies, or let the LLM reason directly over memory instead of treating it as a black box vector store.

Embeddings still make sense as an optimization layer, but making them optional rather than foundational is an important architectural choice if agents are meant to run long-term and stay coherent.

This feels less like a retrieval hack and more like actual infrastructure.

Bohann•1mo ago

Great to see a framework tackling the architecture of memory rather than just retrieval. The concept of separating 'Resource Layer' from 'Memory Item Layer' makes a lot of sense for avoiding context pollution in long-running agents.

Practically speaking, how significant is the improvement in retrieval accuracy compared to a standard RAG setup (e.g., vanilla vector search) for nuanced queries? I'd love to understand the 'lift' I could expect before migrating my current stack.

Fibonacci Number Certificates

AI Overviews are killing the web search, and there's nothing we can do about it

City skylines need an upgrade in the face of climate stress

1979: The Model World of Robert Symes [video]

Satellites Have a Lot of Room

1980s Farm Crisis

Show HN: FSID - Identifier for files and directories (like ISBN for Books)

Show HN: Holy Grail: Open-Source Autonomous Development Agent

Show HN: Minecraft Creeper meets 90s Tamagotchi

Show HN: Termiteam – Control center for multiple AI agent terminals

The only U.S. particle collider shuts down

Ask HN: Why do purchased B2B email lists still have such poor deliverability?

Show HN: Remotion directory (videos and prompts)

Portable C Compiler

Show HN: Kokki – A "Dual-Core" System Prompt to Reduce LLM Hallucinations

Software Engineering Transformation 2026

Microsoft purges Win11 printer drivers, devices on borrowed time

Lunch with the FT: Tarek Mansour

Old Mexico and her lost provinces (1883)

'AI' is a dick move, redux

The source code was the moat. But not anymore

Does anyone else feel like their inbox has become their job?

An AI model that can read and diagnose a brain MRI in seconds

Dev with 5 of experience switched to Rails, what should I be careful about?

AlphaFace: High Fidelity and Real-Time Face Swapper Robust to Facial Pose

Scientists discover “levitating” time crystals that you can hold in your hand

Rammstein – Deutschland (C64 Cover, Real SID, 8-bit – 2019) [video]

Tell HN: Yet Another Round of Zendesk Spam

Postgres Message Queue (PGMQ)

Show HN: Django-rclone: Database and media backups for Django, powered by rclone