For retrieval, there is a semantic filesystem that makes it easy for LLMs to search using shell commands.
It is currently a scrappy v1, but it works better than anything I have tried.
Curious for any feedback!
For retrieval, there is a semantic filesystem that makes it easy for LLMs to search using shell commands.
It is currently a scrappy v1, but it works better than anything I have tried.
Curious for any feedback!
The problem always is that when there are too many memories, the context gets overloaded and the AI starts ignoring the system prompt.
Definitely not a solved problem, and there need to be benchmarks to evaluate these solutions. Benchmarks themselves can be easily gamed and not universally applicable.
Also since I thought for another 30 seconds, the “too many memories!” Problem imo is the same problem as context management and compaction and requires the same approach: more AI telling AI what AI should be thinking about. De-rank “memories” in the context manager as irrelevant and don’t pass them to the outer context. If a memory is de-ranked often and not used enough it gets purged.
ReadMe does support loading memories mid-reasoning! It is simply an agent reading files.
Although GPT-5.4 currently likes to explore a lot upfront, and only then responds. But that is more of a model behaviour (adjustable through prompting) rather than an architectural limitation.
You need clever naming for the filesystem and exploration policy in AGENTS.md. (not trivial!)
The benchmark is definitely the core bottleneck. I don't know any good benchmark for this, probably an open research question in itself.
The hard part is usually knowing what +not+ to write down. Every system I've seen eventually drowns in low-signal entries.
I think in terms of noise, it is less problematic here because not everything is being retrieved. The agent can selectively explore subsets of the tree (plus you can edit the exploration policy by yourself).
Since there is no context bloat, it is quite forgivable to just write things down.
I guess the markdown approach really has a advantage over others.
PS : Something I built on markdown : https://voiden.md/
sudb•8h ago
wenhan_zhou•35m ago
Good question. Since it is just an LLM reading files, it depends entirely on how fast it can call tools, so it depends on the token/s of the model.
Haven't done a formal benchmark, but from the vibes, it feels like a few seconds for GPT-5.4-high per query.
There is an implicit "caching" mechanism, so the more you use it, the smoother it will feel.