I've spent the last several months working on a new approach to document data extraction and I'm excited to share it today.
It’s called Ninjadoc AI, a platform where you can extract structured data from documents by asking questions.
The workflow is simple:
Build a schema visually: You upload a document and ask questions like, "What's the total amount?" or "Who is the customer?" to define the fields you want to extract. Use the API: You get a processor_id for that schema. Then you can POST any similar document (e.g., another invoice) along with the processor_id to our REST API. The key feature is that the API returns a clean JSON with the extracted values, and for every value, it includes the exact bounding box coordinates from the original document. This gives you verifiable proof of where the data came from.
This is designed to replace brittle template-based OCR that breaks with layout changes, and generic LLMs that don't provide verifiable coordinates.
It's built for developers who need to automate extraction from things like invoices, contracts, IDs, etc.
The site is live now at https://ninjadoc.ai. There are 1,250 free credits to start, which is enough to process a good number of documents.
I'm here to answer any questions and would love to hear your feedback!
You can also find it on producthunt: https://www.producthunt.com/products/ninjadoc-ai?launch=ninj...