I’m Ishan, Product Manager at Contextual AI.
We're excited to announce our document parser that combines the best of custom vision, OCR, and vision language models to deliver unmatched accuracy.
There are a lot of parsing solutions out there—here’s what makes ours different: 1) Document hierarchy inference: Unlike traditional parsers that process documents as isolated pages, our solution infers a document’s hierarchy and structure. This allows you to add metadata to each chunk that describes its position in the document, which then lets your agents understand how different sections relate to each other and connect information across hundreds of pages. 2) Minimized hallucinations: Our multi-stage pipeline minimizes severe hallucinations while also providing bounding boxes and confidence levels for table extraction to simplify auditing its output. 3) Superior handling of complex modalities: Technical diagrams, complex figures and nested tables are efficiently processed to support all of your data.
In an end-to-end RAG evaluation of a dataset of SEC 10Ks and 10Qs (containing 70+ documents spanning 6500+ pages), we found that including document hierarchy metadata in chunks increased the equivalence score from 69.2% to 84.0%.
Getting started The first 500+ pages in our Standard mode (for complex documents that require VLMs and OCR) are free if you want to give it a try. Just create a Contextual AI account (https://app.contextual.ai/?signup=1) and visit the Components tab to use the Parse UI playground, or get an API key and call the API directly.
Documentation 1) /parse API: https://docs.contextual.ai/api-reference/parse/parse-file 2) Python SDK: https://github.com/ContextualAI/contextual-client-python/blo... 3) Code examples: https://github.com/ContextualAI/examples/blob/main/03-standa... 4) Blog post: https://contextual.ai/blog/document-parser-for-rag/
Happy to answer any questions about how our document parser works or how you might integrate it into your RAG systems!