I’ve also heard very good things about these two in particular:
- LightOnOCR-2-1B: https://huggingface.co/lightonai/LightOnOCR-2-1B
- PaddleOCR-VL-1.5: https://huggingface.co/PaddlePaddle/PaddleOCR-VL-1.5
The OCR leaderboards I’ve seen leave a lot to be desired.
With the rapid release of so many of these models, I wish there were a better way to know which ones are actually the best.
I also feel like most/all of these models don’t handle charts, other than to maybe include a link to a cropped image. It would be nice for the OCR model to also convert charts into markdown tables, but this is obviously challenging.
I remember that one clearing the scoreboard for many years, and usually it's the one I grab for OCR needs due to its reputation.
aliljet•1h ago
And here's the kicker. I can't afford mistakes. Missing a single character or misinterpreting it could be catastrophic. 4 units vacant? 10 days to respond? Signature missing? Incredibly critical things. I can't find an eval that gives me confidence around this.
cinntaile•1h ago
daveguy•1h ago
coder543•58m ago
But, as others said, if you can't afford mistakes, then you're going to need a human in the loop to take responsibility.
HPsquared•19m ago
chrsw•39m ago
aliljet•35m ago
Imustaskforhelp•31m ago
fragmede•6m ago
yieldcrv•17m ago
We analyze 200 page contracts no problem
I think you're doing it wrong or in an antiquated way (until context window sizes improve)
Are you doing this programmatically at all or are you doing something closer to dropping a contract into a chat window?
We use a main agent to classify the pages and we build subagents that are familiar with page classifications and are fed page ranges. They all have their own full context window and prompts