Current 80/20-rule-ignoring AI dogma in a nutshell.
And I assume the multimodal tools still use OCR for text extraction, or am I missing something?
My understanding is that they're still doing OCR+NLP, just differently than traditional approaches.
I think that's more because of the current state of the industry, a lot of those models are either internal, paywall locked or annoying to use. I don't want to waste effort in trying to sign up for a 4 week trail of X service to perform a one off task.
Unfortunately, this post didn't really elucidate or go into an interesting topic within this space.
I'm not expecting a research paper, but it would be great to get some stats, graphs, examples and meat on the bones. I opened this up expecting some actual examples of problems within OCR & NLP and showing how X multi-modal model solves them.
While Gemini is nice, it would be nice to have a pipeline that works locally on a reasonably RAM’d unified memory Mac or Framework AMD board.
[1] https://www.bbc.com/news/technology-23588202
[2] https://www.dkriesel.com/en/blog/2013/0810_xerox_investigati...
behnamoh•5h ago
WesleyLivesay•5h ago
OtherShrezzing•5h ago
thaeli•5h ago