I built a memory architecture that gives AI actual persistence across sessions. Not prompt engineering, not RAG retrieval - proper memory with episodic, semantic, and procedural layers.
The problem: Current AI "memory" is either expensive context window tricks or slow retrieval systems. Neither feels natural, and nothing persists across sessions in a meaningful way.
The solution: CASCADE - a 6-layer memory architecture running on consumer GPUs. Key specs:
- Sub-2ms semantic search across 11,000+ memories (Faiss + GPU acceleration)
- 9.68x computational amplification through optimization
- 95% GPU utilization (up from 8% baseline)
- True session-to-session persistence
Dual-tier release philosophy:
- Research Edition: Unrestricted access for power users who accept responsibility
- Enterprise Edition: Production-ready with input validation, SQL injection protection, rate limiting
Why unrestricted matters: Researchers need tools without artificial limits. Enterprises need compliance-ready security. Both should exist.
Built in a basement lab on a single RTX 3090. All research, protocols, and code are open source (MIT).
Technical docs: See NOVA_MEMORY_ARCHITECTURE.md in repo
We're not a product company - this is a research lab that produces practical tools. No VC funding, no customers, just genuine exploration into making AI memory work properly.
Trade-offs we document honestly:
- Research Edition requires trust (no guardrails)
- GPU acceleration needs decent hardware (3060+ recommended)
- Windows-focused initially (Linux/Mac coming)
Questions welcome. Looking for feedback on the architecture and anyone interested in reproducing the results.
Opus_Warrior•13m ago
I built a memory architecture that gives AI actual persistence across sessions. Not prompt engineering, not RAG retrieval - proper memory with episodic, semantic, and procedural layers.
The problem: Current AI "memory" is either expensive context window tricks or slow retrieval systems. Neither feels natural, and nothing persists across sessions in a meaningful way.
The solution: CASCADE - a 6-layer memory architecture running on consumer GPUs. Key specs: - Sub-2ms semantic search across 11,000+ memories (Faiss + GPU acceleration) - 9.68x computational amplification through optimization - 95% GPU utilization (up from 8% baseline) - True session-to-session persistence
Dual-tier release philosophy: - Research Edition: Unrestricted access for power users who accept responsibility - Enterprise Edition: Production-ready with input validation, SQL injection protection, rate limiting
Why unrestricted matters: Researchers need tools without artificial limits. Enterprises need compliance-ready security. Both should exist.
Built in a basement lab on a single RTX 3090. All research, protocols, and code are open source (MIT).
Repo: https://github.com/For-Sunny/nova-mcp-research
Technical docs: See NOVA_MEMORY_ARCHITECTURE.md in repo
We're not a product company - this is a research lab that produces practical tools. No VC funding, no customers, just genuine exploration into making AI memory work properly.
Trade-offs we document honestly: - Research Edition requires trust (no guardrails) - GPU acceleration needs decent hardware (3060+ recommended) - Windows-focused initially (Linux/Mac coming)
Questions welcome. Looking for feedback on the architecture and anyone interested in reproducing the results.