I ran into this building RAG over scientific literature. The standard approach (embed chunks, find top-k, generate) works fine for simple Q&A but falls apart when you need real research depth: multi-hop reasoning across papers, synthesizing conflicting results, tracing a finding back to the exact passage in a methods section. The problem wasn’t the models.
Dewey treats documents, sections, and chunks as first-class API primitives. The section manifest (full heading hierarchy with titles and byte offsets) lets agents scan cheaply before committing to full chunk retrieval, the same way a researcher skims a table of contents before reading. The /research endpoint runs an agentic loop; at “exhaustive” depth it can traverse an entire corpus, iteratively query, and return a grounded answer with numbered inline citations pointing to the exact source passage.
Two ways in:
- REST API + TypeScript/Python SDKs for developers building research or document Q&A into their apps - MCP server (@meetdewey/mcp on npm) for anyone using Claude, ChatGPT, or Cursor. Your document collections become tools without writing any code.
Bring your own OpenAI key and depth becomes a quality setting rather than a billing one. That includes AI image captioning, which makes figures and diagrams searchable alongside your text. No markup on generation.
Built this solo. Happy to answer questions about the architecture, the retrieval design, or anything else. Curious whether others have found section-aware retrieval makes a meaningful difference vs. flat chunking in practice.
Free tier, no credit card required: https://meetdewey.com