For a while I was just manually pasting in context at the start of every session which is exactly as painful as it sounds. Eventually I built a small proxy that sits between my client and Ollama and tries to solve this. It embeds recent interactions, stores them locally, and injects the relevant chunks when a new session starts. It works well enough that I actually use it every day now, but I built it the way someone with no formal CS background builds things, which means I patched it into shape and I am not totally confident the architecture is right.
The part that still bothers me is scoping. I work on a few different projects at the same time and I do not want context from one bleeding into another. Right now I am managing that by hand, basically just keeping separate directories and being careful, but that feels like a workaround not a solution.
Genuinely curious what other people have landed on. Are you using a vector DB for retrieval, or plain files, or something MCP based, or have you just accepted that local sessions are stateless and built your workflow around that? And if you have solved the scoping problem cleanly I really want to know how.