Benchmarking the Most Reliable Document Parsing API

https://www.tensorlake.ai/blog/benchmarks

18•calavera•2h ago

Comments

serjester•1h ago

This is just a company advertisement, not even one that’s well done. They didn’t benchmark any of the real leaders in the space (reducto, extend, etc) and left Gemini out of the first two tests, presumably because it was the best performer (while also being multiple orders of magnitude cheaper).

JLO64•1h ago

Personally I use OpenAI models via the API for transcription of PDF files. Is there a big difference between them and Gemini models?

diptanu•1h ago

Hey! I am the founder of Tensorlake. We benchmarked the models that our customers consider using in enterprises or regulated industries where there is a big need for processing documents for various automation. Benchmarking takes a lot of time so we focussed on the ones that we get asked about.

On Gemini and other VLMs - we excluded these models because they don't do visual grounding - aka they don't provide page layouts, bounding boxes of elements on the pages. This is a table stakes feature for use-cases customers are building with Tensorlake. It wouldn't be possible to build citations without bounding boxes.

On pricing - we are probably the only company offer a pure on-demand pricing without any tiers. With Tensorlake, you can get back markdown from every page, summaries of figures, tables and charts, structured data, page classification, etc - in ONE api call. This means we are running a bunch of different models under the hood. If you add up the token count, and complexity of infrastructure to build a complex pipeline around Gemini, and other OCR/Layout detection model I bet the price you would end up with won't be any cheaper than what we provide :) Plus doing this at scale is very very complex - it requires building a lot of sophisticated infrastructure - another source of cost behind modern Document Ingestion services.

ianhawes•1h ago

I just tested a non-English document and it rendered English text. Does your model not support anything other than English?

diptanu•44m ago

It does, we have users in Europe and Asia using it with non English languages. Can you please send me a message at diptanu at tensorlake dot ai, would love to see why it didn’t work.

coderintherye•57m ago

Google's Vertex API for document processing absolutely does bounding boxes. In fact, some of the document processors are just a wrap around Google's product.

diptanu•45m ago

OP mentioned Gemini and not Google’s Vertex OCR API which has very different performance and accuracy characteristics than Gemini

hotpaper75•1h ago

Thanks for mentioning them, indeed their post seem to only surface a couple of names in the field and maybe not the most relevant ones.

karakanb•1h ago

I have been recently looking into extracting a bunch of details from a set of legacy invoice PDFs and had a subpar experience. Gemini was the best among the ones that I tried, but even that missed quite a bit. I'll definitely give this a look.

It seems like such a crowded space and there are many tools doing document extraction, I wonder if there's anything particular pulling more attention into the space?

recursive4•53m ago

Curious how it compares to https://github.com/datalab-to/chandra

diptanu•41m ago

We haven’t texted Chandra yet, because it’s very new. Under the hood Tensorlake is very similar to Marker - it’s a pipeline based OCR API, we do layout detection, Text Recognition and Detection, Table Structure Understanding, etc. We then use VLMs to enrich the results. Our models are much bigger than marker, and thus takes a little longer to parse documents. We optimized for accuracy. We will have a faster API soon.

kissgyorgy•28m ago

I just tried it out and docling finished in 20s (with pretty good results) the same document which in Tensorlake is still pending for 10 minutes. I won't even wait for the results.

goldenjm•14m ago

This would be more helpful if it included DeepSeek-OCR, PaddleOCR-VL and MinerU 2.5. In general, I've found that OmniDocBench is a reliable benchmark, perhaps surprisingly because it is made by the same team as MinerU. They updated their benchmark table recently: https://github.com/opendatalab/OmniDocBench#end-to-end-evalu.... There are some other models that score above DeepSeek-OCR as well that I'm not as familiar with.

Two Billion Email Addresses Were Exposed

Kimi K2 Thinking, a SOTA open-source trillion-parameter reasoning model

Swift on FreeBSD Preview

ICC ditches Microsoft 365 for openDesk

LLMs Encode How Difficult Problems Are

Show HN: I scraped 3B Goodreads reviews to train a better recommendation model

Open Source Implementation of Apple's Private Compute Cloud

The Parallel Search API

What if hard work felt easier?

Show HN: TabPFN-2.5 – SOTA foundation model for tabular data

I analyzed the lineups at the most popular nightclubs

Auraphone: A simple app to collect people's info at events

Senior BizOps at Artie (San Francisco)

Show HN: Dynamic code and feedback walkthroughs with your coding Agent in VSCode

FBI tries to unmask owner of archive.is

Eating stinging nettles

Benchmarking the Most Reliable Document Parsing API

Springs and Bounces in Native CSS

Mathematical exploration and discovery at scale

UK outperforms US in creating unicorns from early stage VC investment

Show HN: See chords as flags – Visual harmony of top composers on musescore

Supply chain attacks are exploiting our assumptions

Cloudflare Tells U.S. Govt That Foreign Site Blocking Efforts Are Trade Barriers

Show HN: qqqa – A fast, stateless LLM-powered assistant for your shell

IKEA launches new smart home range with 21 Matter-compatible products

I may have found a way to spot U.S. at-sea strikes before they're announced

Black Hole Flare Is Biggest and Most Distant Seen

How I am deeply integrating Emacs

Pico-100BASE-TX: Bit-Banged 100 MBit/s Ethernet and UDP Framer for RP2040/RP2350

Phantom in the Light: The story of early spectroscopy

Two Billion Email Addresses Were Exposed

Kimi K2 Thinking, a SOTA open-source trillion-parameter reasoning model

Swift on FreeBSD Preview

ICC ditches Microsoft 365 for openDesk

LLMs Encode How Difficult Problems Are

Show HN: I scraped 3B Goodreads reviews to train a better recommendation model

Open Source Implementation of Apple's Private Compute Cloud

The Parallel Search API

What if hard work felt easier?

Show HN: TabPFN-2.5 – SOTA foundation model for tabular data

I analyzed the lineups at the most popular nightclubs

Auraphone: A simple app to collect people's info at events

Senior BizOps at Artie (San Francisco)

Show HN: Dynamic code and feedback walkthroughs with your coding Agent in VSCode

FBI tries to unmask owner of archive.is

Eating stinging nettles

Benchmarking the Most Reliable Document Parsing API

Springs and Bounces in Native CSS

Mathematical exploration and discovery at scale

UK outperforms US in creating unicorns from early stage VC investment

Show HN: See chords as flags – Visual harmony of top composers on musescore

Supply chain attacks are exploiting our assumptions

Cloudflare Tells U.S. Govt That Foreign Site Blocking Efforts Are Trade Barriers

Show HN: qqqa – A fast, stateless LLM-powered assistant for your shell

IKEA launches new smart home range with 21 Matter-compatible products

I may have found a way to spot U.S. at-sea strikes before they're announced

Black Hole Flare Is Biggest and Most Distant Seen

How I am deeply integrating Emacs

Pico-100BASE-TX: Bit-Banged 100 MBit/s Ethernet and UDP Framer for RP2040/RP2350

Phantom in the Light: The story of early spectroscopy

Benchmarking the Most Reliable Document Parsing API

Comments