Ask HN: Recommendations for self hostable OCR to extract code from images

3•vivzkestrel•3w ago

- Requirements

- You are not paying per inference, you can self host the model

- It can run inside AWS EC2

- It has very high levels of accuracy for extracting code from images

- what are some of the most accurate OCR models out there that can extract code from images

Comments

vivzkestrel•3w ago

- as you know most models are trained on PDF, receipts, normal text etc

- this however doesn't work really well for structured text like code

- what are some absolutely state of the art self hostable OCR models out there capable of extracting code from text with very high levels of accuracy

- I have tried tesseract currently and it is not very good with this. Even if you are not familiar with any other model, perhaps you can suggest a pipeline for tesseract that I can follow to improve the accuracy of the extraction process

- Currently, my pipeline looks like this:

- for every input image, check if the image is light text on dark background or dark text on light background

- as you know tesseract is trained from mostly dark text on light background so I invert the images with dark background before processing them with tesseract

- are there other processes you think that I need to include?

treetalker•3w ago

Not sure about running it in AWS, but this works well even on Intel Macs:

https://github.com/LESIM-Co-Ltd/CoreOCR

There are other similar wrappers for macOS Vision framework; just search on GitHub.

vivzkestrel•3w ago

- On quick look it seems like it has a CLI

- my primary use case is to invoke it from a server such as python / express to perform batch recognition for images submitted via api endpoints.

- Any ideas how much time it needs for OCRing a 1280x720 png image. Thank you for sharing that btw

treetalker•3w ago

It is definitely run from the CLI.

I don't know but it's pretty fast. If I understand correctly, it's using the same functionality as the OCR in Apple Notes. (And for that reason Xcode needs to be installed.)

Ask HN: Do we need "metadata in source code" syntax that LLMs will never delete?

Pentagon cutting ties w/ "woke" Harvard, ending military training & fellowships

Can Quantum-Mechanical Description of Physical Reality Be Considered Complete? [pdf]

Kessler Syndrome Has Started [video]

Complex Heterodynes Explained

EVs Are a Failed Experiment

MemAlign: Building Better LLM Judges from Human Feedback with Scalable Memory

CCC (Claude's C Compiler) on Compiler Explorer

Homeland Security Spying on Reddit Users

Actors with Tokio (2021)

Can graph neural networks for biology realistically run on edge devices?

Deeper into the shareing of one air conditioner for 2 rooms

Weatherman introduces fruit-based authentication system to combat deep fakes

Why Embedded Models Must Hallucinate: A Boundary Theory (RCC)

A Curated List of ML System Design Case Studies

Pony Alpha: New free 200K context model for coding, reasoning and roleplay

Show HN: Tunbot – Discord bot for temporary Cloudflare tunnels behind CGNAT

Open Problems in Mechanistic Interpretability

Bye Bye Humanity: The Potential AMOC Collapse

Dexter: Claude-Code-Style Agent for Financial Statements and Valuation

Digital Iris [video]

Essential CDN: The CDN that lets you do more than JavaScript

They Hijacked Our Tech [video]

Vouch

HRL Labs in Malibu laying off 1/3 of their workforce

Show HN: High-performance bidirectional list for React, React Native, and Vue

Show HN: I built a Mac screen recorder Recap.Studio

Ask HN: Codex 5.3 broke toolcalls? Opus 4.6 ignores instructions?

Vectors and HNSW for Dummies

Sanskrit AI beats CleanRL SOTA by 125%