> can someone help folks at Mistral find more weak baselines to add here? since they can't stomach comparing with SoTA....
> (in case y'all wanna fix it: Chandra, dots.ocr, olmOCR, MinerU, Monkey OCR, and PaddleOCR are a good start)
Its failure mode are also vastly different. VLM-based extraction can misread entire sentences or miss entire paragraphs. Sonnet 3 had that issue. Computer vision models instead will make in-word typos.
It seems like EU in general should be heavily invested in Mistral's development, but it doesn't seem like they are.
I don't know... feels like this sort of area, while not nearly so sexy as video production or coding or (etc.)... but seems like reaching a better-than-human performance level should be easier for these kinds of workloads.
Maybe, i think it will be to our benefit when the bubble pops that we are not heavily invested, no harm investing a little.
Until then, they seem to be able to keep enough talent in the EU to train reasonably good models. The kernel is there, which seems like the attainable goal.
Are they? IIRC their best model is still worse than the gpt-oss-120B?
1. Use native PDF parsing if the model supports it
2. Use this Mistral OCR model (we updated to this version yesterday)
3. UNLESS you override the "engine" param to use an alternate. We support a JS-based (non-LLM) parser as well [0]
So yes, in practice a lot of OCR jobs go to Mistral, but not all of them.
Would love to hear requests for other parsers if folks have them!
[0] https://openrouter.ai/docs/guides/overview/multimodal/pdfs#p...
- table entries hallucinated - tables messed up (tables merged, forgot rows) - forgot to parse some text passages
if you are doing something serious, i would not use it
Regular Gemini Thinking can actually get 70-80% of the documents correct except lots of mistakes on given names. Chatgpt maybe understands like 50-60%.
This Mistral model butchered the whole text, literally not a word was usable. To the point I think I'm doing something wrong.
The test document: https://files.fm/u/3hduyg65a5
I don’t know how they can make this statement with 79% accuracy rate. For any serious use case, this is an unacceptable number.
I work with scientific journals and issues like 2.9+0.5 and 29+0.5 is something we regularly run into that has us never being able to fully trust automated processes and require human verification every step.
What matters is whether this is better than competition/alternatives. Of course nobody is just going to take the output as is. If you do that, that's your problem.
pzo•1h ago
- paddleOCR-VL
- olmOCR-2
- chandra
- dots.ocr
I kind of miss there is not many leaderboard sections or arena for OCR and CV and providers hosting those. Neglected on both Artificial Analysis and OpenRouter.
pzo•1h ago
E.g. with Gemini 3.0 flash you might seem that model pricing increased only slightly comparing to Gemini 2.5 flash until you test it and will see that what used to be 258 per 384x384 input tokens now is around 3x more.
hereme888•46m ago
culi•31m ago
https://www.ocrarena.ai/leaderboard
Hasn't been updated for Mistral but so far gemeni seems to top the leaderboard.
jeffbee•17m ago
andai•4m ago
andai•5m ago
It took an hour and a half to install 12 gigabytes of pytorch dependencies that can't even run on my device, and then it told me it had some sort of versioning conflict. (I think I was supposed to use UV, but I had run out of steam by that point.)
Maybe I should have asked Claude to install it for me. I gave Claude root on a $3 VPS, and it seems to enjoy the sysadmin stuff a lot more than I do...
Incidentally I had a similar experience installing open web UI... It installed 12 GB of pytorch crap.. I rage quit and deleted the whole thing, and replicated the functionality I actually needed in 100 lines of HTML.... Too bad I can't do that with OCR ;)