I'm excited to share TXT OS — an open-source AI reasoning engine that runs entirely inside a single `.txt` file.
- No installs, no signup, no hidden code — just copy-paste the file into any LLM chat window (GPT, Claude, Gemini, etc.). - +22.4% semantic accuracy, +42.1% reasoning success, and 3.6× more stability (benchmarked on GSM8K and Truthful-QA). - Features Semantic Tree Memory, Hallucination Shield, and fully exportable logic. - MIT Licensed, zero tracking, zero ads.
Why did I build this? I wanted to prove that advanced reasoning and memory could be made open, portable, and accessible to anyone — just with pure text, no software or setup.
A note: I'm from China, and English is not my first language. This post and the docs were partly assisted by AI, but I personally reviewed and approved every line of content. All ideas, design, and code are my own work. If anything is unclear or could be improved, I really welcome your feedback!
I'm the author, and happy to answer any questions or suggestions here!
ultimateking•7h ago
1. How does TXT OS store its “Semantic Tree Memory” between sessions? 2. When `kbtest` detects a hallucination, what happens next? 3. Any idea of the speed impact on smaller models like LLaMA-2-13B?
Thanks for sharing—excited to try it out!
TXTOS•7h ago
We actually serialize the tree as a compact JSON-like structure right in the TXT file—each node gets a header like #NODE:id and indented subtrees. When you reload, TXT OS parses those markers back into your LLM’s memory map. No external DB needed—just plain text you can copy-paste between sessions.
--- When kbtest Fires
Internally it tracks our ΔS metric (semantic tension). Once ΔS crosses a preset threshold, kbtest prints a warning and automatically rolls you back to the last “safe” tree checkpoint. That means you lose only the bad branch, not your entire session. Think of it like an undo button for hallucinations.
--- Performance on LLaMA-2-13B
Benchmarks were on GPT-4, but on a 13B model you’ll see roughly a 10–15% token-generation slow-down thanks to the extra parsing and boundary checks. In practice that’s about +2 ms per token, which most folks find an acceptable trade-off for the added stability.
Hope that clears things up—let me know if you hit any weird edge cases!