Problem: Context windows are ephemeral. Swapping or upgrading a local model discards accumulated memory and identity.
Approach: Treat the model strictly as a stateless inference engine. All continuity (memory, identity, recall) lives outside the model as structured, append-only events managed by the runtime.
At a high level: - Model-agnostic: swap local models mid-conversation without losing continuity - Explicit memory: all memory is written as structured events, not hidden states - Deterministic recall: context is assembled via a visible, inspectable pipeline before inference - Full observability: inspect exactly what the model sees on each turn - Local-first execution, no cloud dependency
I wrote a small reference specification focused on structure rather than implementation details: https://github.com/NodeEHRIS/node-spec
Short demo (12s) showing persistent memory across a local model swap: https://www.youtube.com/watch?v=ZAr3J30JuE4
This is intentionally minimal and early. I’m mainly interested in feedback on architectural tradeoffs and where this breaks down compared to RAG, long-context approaches, or agent frameworks.