We built Poor Man's Memory because we were sick of rehydrating agents every session and burning tokens in the process. We ran 1451 dispatches to understand our memory system better and measured something no one else seems to be measuring: institutional coherence, not just task completion.
The right answer not grounded in institutional knowledge should be considered a partial true positive (or negative) at best, right answer citing the wrong knowledge a false positive. We found from our own experience, simply the measuring task completion is not enough.
The most surprising finding: the partial context trap. Half an answer is worse than no answer at all. We burned 150 dispatches learning this the hard way.
Also turns out the 85% lift we measured is exactly the known retrieval-to-oracle upper bound from 20 years of IR literature. We had no idea when we ran the test. No one ever beats that number. That's the number to beat in agentic memory.
Paper (v1.0): https://nominex.org/research/what-we-found-building-poor-mans-multi-agent-memory.html
Happy to answer questions.