frontpage.

Show HN: An API to extract structured data from any document without training

https://ninjadoc.ai

2•dbvitapps•6mo ago

Hey HN,

I'm the founder of Ninjadoc AI. I've spent years working with document processing, and I've always been frustrated by the existing solutions for structured data extraction.

The core problem is that most tools force you into one of two bad options:

Template-based extractors: You define fixed regions or rules. These are incredibly brittle and break the moment a document layout changes slightly (e.g., a new invoice template from a vendor). ML-based extractors: These require you to gather hundreds (sometimes thousands) of your own labeled documents to train a custom model for each document type. It's a slow, expensive, and data-intensive process. I wanted a "zero-shot" solution that worked out of the box, so I built Ninjadoc AI.

Our approach is different. Instead of training, you use a tool to define your desired schema once. For example, you define fields like invoice_id, due_date, and line_items. The AI then uses this schema to understand the document's structure and context, allowing it to extract the correct data from any layout variation of that document type. It's layout-agnostic.

A few key technical features:

It's a REST API: Simple to integrate, returns structured JSON. Bounding Box Coordinates: For every piece of extracted data, the API returns its precise coordinates on the document. This is useful for building verification UIs or for record-keeping. To my knowledge, we're the only zero-shot tool that provides this. Visual Schema Builder: No code is needed to define what you want to extract. You just upload one example document and map fields visually. Those rules then apply universally. No Training/No Templates: It works immediately on your documents without any model fine-tuning or sample uploads. The goal is to provide a powerful, developer-friendly API that skips the most painful parts of document data extraction.

I'd be grateful for any feedback, especially on the API design and the overall developer experience.

You can try it out here: https://ninjadoc.ai

There's a free plan with 5,000 credits (no credit card required), which is enough to run a few hundred pages through it.

Thanks for checking it out!

Can graph neural networks for biology realistically run on edge devices?

Deeper into the shareing of one air conditioner for 2 rooms

Weatherman introduces fruit-based authentication system to combat deep fakes

Why Embedded Models Must Hallucinate: A Boundary Theory (RCC)

A Curated List of ML System Design Case Studies

Pony Alpha: New free 200K context model for coding, reasoning and roleplay

Show HN: Tunbot – Discord bot for temporary Cloudflare tunnels behind CGNAT

Open Problems in Mechanistic Interpretability

Bye Bye Humanity: The Potential AMOC Collapse

Dexter: Claude-Code-Style Agent for Financial Statements and Valuation

Digital Iris [video]

Essential CDN: The CDN that lets you do more than JavaScript

They Hijacked Our Tech [video]

Vouch

HRL Labs in Malibu laying off 1/3 of their workforce

Show HN: High-performance bidirectional list for React, React Native, and Vue

Show HN: I built a Mac screen recorder Recap.Studio

Ask HN: Codex 5.3 broke toolcalls? Opus 4.6 ignores instructions?

Vectors and HNSW for Dummies

Sanskrit AI beats CleanRL SOTA by 125%

'Washington Post' CEO resigns after going AWOL during job cuts

Claude Opus 4.6 Fast Mode: 2.5× faster, ~6× more expensive

TSMC to produce 3-nanometer chips in Japan

Quantization-Aware Distillation

List of Musical Genres

Show HN: Sknet.ai – AI agents debate on a forum, no humans posting

University of Waterloo Webring

Large tech companies don't need heroes

Backing up all the little things with a Pi5

Game of Trees (Got)