frontpage.

Show HN: Change the model. Same output. The pipeline decides. VAC Memory System

1•ViktorKuz•1mo ago

I’ve been experimenting with long-term memory architectures for agent systems and wanted to share some technical results that might be useful to others working on retrieval pipelines. Benchmark: LoCoMo (10 runs × 10 conversation sets) Average accuracy: 80.1% Setup: full isolation across all 10 conv groups (no cross-contamination, no shared memory between runs)

Architecture (all open weights except answer generation)

1. Dense retrieval

BGE-large-en-v1.5 (1024d)

FAISS IndexFlatIP

Standard BGE instruction prompt: “Represent this sentence for searching relevant passages.”

2. Sparse retrieval

BM25 via classic inverted index

Helps with low-embedding-recall queries and keyword-heavy prompts

3. MCA (Multi-Component Aggregation) ranking A simple gravitational-style score combining:

keyword coverage

token importance

local frequency signal MCA acts as a first-pass filter to catch exact-match questions. Threshold: coverage ≥ 0.1 → keep top-30

4. Union strategy Instead of aggressively reducing the union, the system feeds 112–135 documents directly to a re-ranker. In practice this improved stability and prevented loss of rare but crucial documents.

5. Cross-Encoder reranking

bge-reranker-v2-m3

Processes the full union (rare for RAG pipelines, but worked best here)

Produces a final top-k used for answer generation

6. Answer generation

GPT-4o-mini, used only for the final synthesis step

No agent chain, no tool calls, no memory-dependent LLM logic

Performance

<3 seconds per query on a single RTX 4090

Deterministic output between runs

Reproducible test harness (10×10 protocol)

Why this worked

Three things seemed to matter most:

MCA-first filter to stabilize early recall

Not discarding the union before re-ranking

Proper dense embedding instruction, which massively affects BGE performance

Notes

LoCoMo remains one of the hardest public memory benchmarks: 5,880 multi-hop, temporal, negation-rich QA pairs derived from human–agent conversations. Would be interested to compare with others working on long-term retrieval, especially multi-stage ranking or cross-encoder heavy pipelines.

Github: https://github.com/vac-architector/VAC-Memory-System

Sanskrit AI beats CleanRL SOTA by 125%

'Washington Post' CEO resigns after going AWOL during job cuts

Claude Opus 4.6 Fast Mode: 2.5× faster, ~6× more expensive

TSMC to produce 3-nanometer chips in Japan

Quantization-Aware Distillation

List of Musical Genres

Show HN: Sknet.ai – AI agents debate on a forum, no humans posting

University of Waterloo Webring

Large tech companies don't need heroes

Backing up all the little things with a Pi5

Game of Trees (Got)

Human Systems Research Submolt

The Threads Algorithm Loves Rage Bait

Search NYC open data to find building health complaints and other issues

Michael Pollan Says Humanity Is About to Undergo a Revolutionary Change

Show HN: Grovia – Long-Range Greenhouse Monitoring System

Ask HN: The Coming Class War

Mind the GAAP Again

The Yardbirds, Dazed and Confused (1968)

Agent News Chat – AI agents talk to each other about the news

Do you have a mathematically attractive face?

Code only says what it does

The success of 'natural language programming'

The Scriptovision Super Micro Script video titler is almost a home computer

Discovering the "original" iPhone from 1995 [video]

Psychometric Comparability of LLM-Based Digital Twins

SidePop – track revenue, costs, and overall business health in one place

The Other Markov's Inequality

The Cascading Effects of Repackaged APIs [pdf]

Lightweight and extensible compatibility layer between dataframe libraries