That doesn't hold for any of the GPU-based solutions, last time I checked.
Using Surya gets you significantly better results and makes almost all the work detailed in the article largely unnecessary.
VLLM hallucination is a blocker for my use case.
Otherwise I'd say just use your operating system's OCR API. Both Windows and MacOS have excellent APIs for this.
Their training code and data is closed source. They are barely open weight and only inference is open source.
Perhaps the specific idea is to harvest coding textbooks as training data for LLMs?
I can also imagine plenty of YouTube tutorials that type the code live... seems fairly useful
This is nightmare for endpoint protection. Imagine rogue employees snapping pics of your proprietary codebase and then using this to reassemble it.
Why not use Ollama-OCR?
Is it, though? If the important parts of the code are new, does it matter that other parts are older or derived from older code? (Of course, I think this whole line of thought is pointless; what matters is not age, but how well it works, and tesseract generally does seem to work.)
Maybe try OpenAI GPT-4o or Google's Document AI https://cloud.google.com/document-ai
NeXTStep was real UNIX, but macOS is not.
BTW, I was taught to program in C by one of the original core Unix team members and I worked for DEC long before I could have discussed TesseractOCR with people who didn't. Keep those ignorant downvotes commin'
Stop accepting PDFs and force things to use APIs ...
camtarn•1mo ago
lelag•1mo ago
https://www.softwareheritage.org/wp-content/uploads/2019/07/...
ivanjermakov•1mo ago
dewey•1mo ago
Even with these examples that seems like a very narrow use case.
FloatArtifact•1mo ago
SloopJon•1mo ago
gosub100•1mo ago
EvanAnderson•1mo ago
It fills me with a deep sadness that we created deterministic machines then, though laziness, exploit every opportunity to "contaminate" them with sloppy practices that make them produce output with the same fuzzy inaccuracy as human brains.
Old man yells a neural networks take: We're entering a "The Machine Stops" era where nobody is going to know how to formulate basic algorithms.
"We need to add some numbers. Let's point a camera at the input, OCR it, then feed it to an LLM that 'knows math'. Then we don't have to figure out an algorithm to add numbers."
I wish compute "cost" more so people would be forced to actually make efficient use of hardware. Sadly, I think it'll take mass societal and infrastructure collapse for that to happen. Until it does, though, let the excess compute flow freely!
jocoda•1mo ago