frontpage.

80.1 % on LoCoMo Long-Term Memory Benchmark with a pure open-source RAG pipeline

1•ViktorKuz•1mo ago

I just pushed the current SOTA on the LoCoMo long-term memory benchmark for agents: 80.1 % accuracy using only: -BGE-large-en-v1.5 (1024d) + FAISS

-Custom “MCA” gravitational ranking (keyword coverage + importance + frequency)

-BM25 sparse retrieval

-Direct Cross-Encoder reranking (bge-reranker-v2-m3) on the full union (~120-150 docs)

-Gpt-4o-mini only for final answer generation and judging (everything else is open weights or classic)

Repo: https://github.com/vac-architector/VAC-Memory-System Key tricks that finally broke 80% :

-MCA-first filter (coverage ≥ 0.1 → top-30) — catches exact-keyword questions early

-Feeding the entire union straight into Cross-Encoder (112–135 documents) instead of pre-filtering

-Proper query instruction for BGE-large (the classic “Represent this sentence for searching relevant passages”)

The whole pipeline runs in < 3s per query on a single RTX 4090. LoCoMo is currently the hardest public long-term memory benchmark (5.880 real human–agent conversations, multi-hop, temporal, negation, etc.).

Beating Mem0 official baseline by ~12–14 pp with fully open components feels pretty good. Would love feedback, especially from people who are also grinding on agent memory systems.

My background: My path didn't start in an IT office, but in Columbus, Ohio, where I worked as a handyman after leaving my job on the cell towers. The decision came from necessity: I bought a powerful PC on installments and resolved to create something that would change my life.

I had no experience, but I had an idea. Using Claude CLI as my sole mentor, I focused on architecture, not syntax.

Over 4.5 months of work, I engineered and created the VAC Memory System. To prove its value, I tested it on the toughest RAG benchmark—LoCoMo. Today, my system shows an overall result of 80.1% and a phenomenal 87.78% in the "Commonsense" category.

This is more than just code; it is the result of faith in an idea. I showed that by using modern tools, it is possible to achieve SOTA-level performance and create serious technology, regardless of your starting point. I highly anticipate your feedback.

Microsoft appointed a quality czar. He has no direct reports and no budget

Multi-agent coordination on Claude Code: 8 production pain points and patterns

Washington Post CEO Will Lewis Steps Down After Stormy Tenure

DevXT – Building the Future with AI That Acts

A Minimal OpenClaw Built with the OpenCode SDK

The silent death of Good Code

The Internal Negotiation You Have When Your Heart Rate Gets Uncomfortable

Show HN: Glance – Fast CSV inspection for the terminal (SIMD-accelerated)

Busy for the Next Fifty to Sixty Bud

Imperative

Show HN: I decomposed 87 tasks to find where AI agents structurally collapse

I went back to Linux and it was a mistake

Octrafic – open-source AI-assisted API testing from the CLI

US Accuses China of Secret Nuclear Testing

Peacock. A New Programming Language

A postcard arrived: 'If you're reading this I'm dead, and I really liked you'

What to know about the software selloff

Show HN: Syntux – generative UI for websites, not agents

Microsoft appointed a quality czar. He has no direct reports and no budget

AI overlay that reads anything on your screen (invisible to screen capture)

Show HN: Seafloor, be up and running with OpenClaw in 20 seconds

Tesla turbine-inspired structure generates electricity using compressed air

State Department deleting 17 years of tweets (2009-2025); preservation needed

Learning to code, or building side projects with AI help, this one's for you

Effulgence RPG Engine [video]

Five disciplines discovered the same math independently – none of them knew

We Scanned an AI Assistant for Security Issues: 12,465 Vulnerabilities

Amazon no longer defend cloud customers against video patent infringement claims

Show HN: Medinilla – an OCPP compliant .NET back end (partially done)

How Does AI Distribute the Pie? Large Language Models and the Ultimatum Game

80.1 % on LoCoMo Long-Term Memory Benchmark with a pure open-source RAG pipeline

Microsoft appointed a quality czar. He has no direct reports and no budget

Multi-agent coordination on Claude Code: 8 production pain points and patterns

Washington Post CEO Will Lewis Steps Down After Stormy Tenure

DevXT – Building the Future with AI That Acts

A Minimal OpenClaw Built with the OpenCode SDK

The silent death of Good Code

The Internal Negotiation You Have When Your Heart Rate Gets Uncomfortable

Show HN: Glance – Fast CSV inspection for the terminal (SIMD-accelerated)

Busy for the Next Fifty to Sixty Bud

Imperative

Show HN: I decomposed 87 tasks to find where AI agents structurally collapse

I went back to Linux and it was a mistake

Octrafic – open-source AI-assisted API testing from the CLI

US Accuses China of Secret Nuclear Testing

Peacock. A New Programming Language

A postcard arrived: 'If you're reading this I'm dead, and I really liked you'

What to know about the software selloff

Show HN: Syntux – generative UI for websites, not agents

Microsoft appointed a quality czar. He has no direct reports and no budget

AI overlay that reads anything on your screen (invisible to screen capture)

Show HN: Seafloor, be up and running with OpenClaw in 20 seconds

Tesla turbine-inspired structure generates electricity using compressed air

State Department deleting 17 years of tweets (2009-2025); preservation needed

Learning to code, or building side projects with AI help, this one's for you

Effulgence RPG Engine [video]

Five disciplines discovered the same math independently – none of them knew

We Scanned an AI Assistant for Security Issues: 12,465 Vulnerabilities

Amazon no longer defend cloud customers against video patent infringement claims

Show HN: Medinilla – an OCPP compliant .NET back end (partially done)

How Does AI Distribute the Pie? Large Language Models and the Ultimatum Game