The system models memory as structured beliefs with explicit state, including confidence, salience, contradiction pressure, lifecycle status, memory tier, evidence balance, lineage, user and session scope, and decay behavior. Beliefs can be reinforced, weakened, contested, updated, mutated, or deprecated as new evidence arrives.
The goal is to support longer-running agents that need to deal with stale information, conflicting information, confidence change, and belief revision, rather than only recalling similar prior content.
Current implementation includes a structured belief model, reinforcement and decay mechanics, contradiction handling, tiered memory behavior, session isolation, API support, Docker support, and testing/evaluation infrastructure.
What has been verified so far in the project’s published tests and evals:
822 passing tests
a 1,000-prompt evaluation with an overall score of 825/1000 (82.5%)
reported category scores of 96.8% episodic memory, 94.4% working memory, and 92.8% semantic memory
a 15-block side-by-side evaluation against a raw Ollama baseline, where ABES passed 14/15 blocks and the baseline passed 6/15
a 200-prompt cognitive stress test reported as 3 consecutive runs at 200/200
Two easy verification points:
run PYTHONPATH=$PWD pytest tests/ -q from the repo root
inspect results/side_by_side_eval.json for the block-level comparison output
I do not consider internal tests and project-published evals to be sufficient external validation. The next stages are stronger benchmarking, improved contradiction handling and belief revision, stronger temporal and relational structure, longer-horizon testing, multi-agent shared memory work, and better observability of belief transitions.