Beliefs, behaviors, tensions, and contradictions extracted from conversations, journals, and published text, compressed into an identity brief that any model or memory system can use. An extracted operating guide for AI, where every claim traces back to source facts.
All research, benchmarks, documentation, examples are available on the website and in the github. This has been tested on as little at 8 Personal Journal Entries from a secondary subject, my own gpt conversations exports (30K+ Messages), and on large document corpora like Warren Buffett's Annual Shareholder Letters (350k words), Howard Marks Investment Memos (600K words), and dense autobiographies from Franklin, Douglass, Roosevelt, and Wollstonecraft.
Pipeline currently uses Claude. API costs are <$1 for small data sets and <$5 for large ones, from fact extraction to final brief assembly.
Very interested in feedback, happy to go deeper in the comments on evolution, struggles, research, and future improvements.