During my daily working life, I have tons of office documents with knowledge from all teams, and as an IT Architect, I need to combine them altogether to handle complex deep research (which normal LLM definitely could not help). That is the originally reason I built DocMason, and I am using it in everyday which support me on lots of complex topics.
I have already open-sourced this repo. And I think it takes Karpathy's concept a step further for real-world usage in three ways: 1. It could handle most kinds of office docs (pptx, docx, excels, even .eml). And really extract multimodal information from all IT architecture diagram or excel sheets. 2. It is running as a Real APP but not a naive RAG tool. DocMason could run smoothly and intelligently to prepare environment, auto update, and auto incrementally sync Knowledge base. 3. Most importantly it is running in Native AI Agents, which could leverage powerful AI Agents engine (e.g. Codex or Claude Code)
View detail architecture diagram in DocMason Readme, and then download have a try :) You will find it could help a lot during daily work. Would love to hear your feedback and issues in Github!