frontpage.

80.1 % on LoCoMo Long-Term Memory Benchmark with a pure open-source RAG pipeline

3•ViktorKuz•2mo ago

I just pushed the current SOTA on the LoCoMo long-term memory benchmark for agents: 80.1 % accuracy using only:

-BGE-large-en-v1.5 (1024d) + FAISS

-Custom “MCA” gravitational ranking (keyword coverage + importance + frequency)

-BM25 sparse retrieval

-Direct Cross-Encoder reranking (bge-reranker-v2-m3) on the full union (~120-150 docs)

-Gpt-4o-mini only for final answer generation and judging (everything else is open weights or classic)

Repo: https://github.com/vac-architector/VAC-Memory-System Key tricks that finally broke 80% :

-MCA-first filter (coverage ≥ 0.1 → top-30) — catches exact-keyword questions early

-Feeding the entire union straight into Cross-Encoder (112–135 documents) instead of pre-filtering

-Proper query instruction for BGE-large (the classic “Represent this sentence for searching relevant passages”)

The whole pipeline runs in < 3s per query on a single RTX 4090. LoCoMo is currently the hardest public long-term memory benchmark (5.880 real human–agent conversations, multi-hop, temporal, negation, etc.).

Beating Mem0 official baseline by ~12–14 pp with fully open components feels pretty good. Would love feedback, especially from people who are also grinding on agent memory systems.

My background: My path didn't start in an IT office, but in Columbus, Ohio, where I worked as a handyman after leaving my job on the cell towers. The decision came from necessity: I bought a powerful PC on installments and resolved to create something that would change my life.

I had no experience, but I had an idea. Using Claude CLI as my sole mentor, I focused on architecture, not syntax.

Over 4.5 months of work, I engineered and created the VAC Memory System. To prove its value, I tested it on the toughest RAG benchmark—LoCoMo. Today, my system shows an overall result of 80.1% and a phenomenal 87.78% in the "Commonsense" category.

This is more than just code; it is the result of faith in an idea. I showed that by using modern tools, it is possible to achieve SOTA-level performance and create serious technology, regardless of your starting point. I highly anticipate your feedback.

P2P crypto exchange development company

Vocal Guide – belt sing without killing yourself

Write for Your Readers Even If They Are Agents

Knowledge-Creating LLMs

Maple Mono: Smooth your coding flow

Sid Meier's System for Real-Time Music Composition and Synthesis

Show HN: Slop News – HN front page now, but it's all slop

Show HN: Empusa – Visual debugger to catch and resume AI agent retry loops

Show HN: Bitcoin wallet on NXP SE050 secure element, Tor-only open source

White House Explores Opening Antitrust Probe on Homebuilders

Show HN: MindDraft – AI task app with smart actions and auto expense tracking

How do you estimate AI app development costs accurately?

Going Through Snowden Documents, Part 5

Show HN: MCP Server for TradeStation

Canada unveils auto industry plan in latest pivot away from US

The essential Reinhold Niebuhr: selected essays and addresses

Rentahuman.ai Turns Humans into On-Demand Labor for AI Agents

StovexGlobal – Compliance Gaps to Note

Show HN: Afelyon – Turns Jira tickets into production-ready PRs (multi-repo)

Trump says America should move on from Epstein – it may not be that easy

Tiny Clippy – A native Office Assistant built in Rust and egui

LegalArgumentException: From Courtrooms to Clojure – Sen [video]

US moves to deport 5-year-old detained in Minnesota

If you lose your passport in Austria, head for McDonald's Golden Arches

Show HN: Mermaid Formatter – CLI and library to auto-format Mermaid diagrams

RFCs vs. READMEs: The Evolution of Protocols

Kanchipuram Saris and Thinking Machines

Chinese chemical supplier causes global baby formula recall

I've used AI to write 100% of my code for a year as an engineer

Looking for 4 Autistic Co-Founders for AI Startup (Equity-Based)