That doesn't hold for any of the GPU-based solutions, last time I checked.
Using Surya gets you significantly better results and makes almost all the work detailed in the article largely unnecessary.
VLLM hallucination is a blocker for my use case.
Otherwise I'd say just use your operating system's OCR API. Both Windows and MacOS have excellent APIs for this.
Perhaps the specific idea is to harvest coding textbooks as training data for LLMs?
I can also imagine plenty of YouTube tutorials that type the code live... seems fairly useful
Why not use Ollama-OCR?
Is it, though? If the important parts of the code are new, does it matter that other parts are older or derived from older code? (Of course, I think this whole line of thought is pointless; what matters is not age, but how well it works, and tesseract generally does seem to work.)
Stop accepting PDFs and force things to use APIs ...
camtarn•6h ago
lelag•5h ago
https://www.softwareheritage.org/wp-content/uploads/2019/07/...
ivanjermakov•2h ago
dewey•5h ago
Even with these examples that seems like a very narrow use case.
FloatArtifact•5h ago
SloopJon•4h ago
gosub100•3h ago
EvanAnderson•1h ago
It fills me with a deep sadness that we created deterministic machines then, though laziness, exploit every opportunity to "contaminate" them with sloppy practices that make them produce output with the same fuzzy inaccuracy as human brains.
Old man yells a neural networks take: We're entering a "The Machine Stops" era where nobody is going to know how to formulate basic algorithms.
"We need to add some numbers. Let's point a camera at the input, OCR it, then feed it to an LLM that 'knows math'. Then we don't have to figure out an algorithm to add numbers."
I wish compute "cost" more so people would be forced to actually make efficient use of hardware. Sadly, I think it'll take mass societal and infrastructure collapse for that to happen. Until it does, though, let the excess compute flow freely!
jocoda•1h ago