frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Show HN: I'm 75, building an OSS Virtual Protest Protocol for digital activism

https://github.com/voice-of-japan/Virtual-Protest-Protocol/blob/main/README.md
4•sakanakana00•12m ago•0 comments

Show HN: I built Divvy to split restaurant bills from a photo

https://divvyai.app/
3•pieterdy•15m ago•0 comments

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

https://github.com/valdanylchuk/breezydemo
235•isitcontent•15h ago•25 comments

Show HN: I spent 4 years building a UI design tool with only the features I use

https://vecti.com
332•vecti•17h ago•145 comments

Show HN: If you lose your memory, how to regain access to your computer?

https://eljojo.github.io/rememory/
293•eljojo•17h ago•182 comments

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

https://github.com/phreda4/r3
73•phreda4•14h ago•14 comments

Show HN: Smooth CLI – Token-efficient browser for AI agents

https://docs.smooth.sh/cli/overview
91•antves•1d ago•66 comments

Show HN: I Hacked My Family's Meal Planning with an App

https://mealjar.app
2•melvinzammit•2h ago•0 comments

Show HN: ARM64 Android Dev Kit

https://github.com/denuoweb/ARM64-ADK
17•denuoweb•1d ago•2 comments

Show HN: I built a free UCP checker – see if AI agents can find your store

https://ucphub.ai/ucp-store-check/
2•vladeta•2h ago•1 comments

Show HN: BioTradingArena – Benchmark for LLMs to predict biotech stock movements

https://www.biotradingarena.com/hn
25•dchu17•19h ago•12 comments

Show HN: Slack CLI for Agents

https://github.com/stablyai/agent-slack
47•nwparker•1d ago•11 comments

Show HN: Artifact Keeper – Open-Source Artifactory/Nexus Alternative in Rust

https://github.com/artifact-keeper
151•bsgeraci•1d ago•63 comments

Show HN: Compile-Time Vibe Coding

https://github.com/Michael-JB/vibecode
10•michaelchicory•4h ago•1 comments

Show HN: Gigacode – Use OpenCode's UI with Claude Code/Codex/Amp

https://github.com/rivet-dev/sandbox-agent/tree/main/gigacode
17•NathanFlurry•23h ago•9 comments

Show HN: Slop News – HN front page now, but it's all slop

https://dosaygo-studio.github.io/hn-front-page-2035/slop-news
13•keepamovin•5h ago•5 comments

Show HN: Horizons – OSS agent execution engine

https://github.com/synth-laboratories/Horizons
23•JoshPurtell•1d ago•5 comments

Show HN: Daily-updated database of malicious browser extensions

https://github.com/toborrm9/malicious_extension_sentry
14•toborrm9•20h ago•7 comments

Show HN: Micropolis/SimCity Clone in Emacs Lisp

https://github.com/vkazanov/elcity
172•vkazanov•2d ago•49 comments

Show HN: Fitspire – a simple 5-minute workout app for busy people (iOS)

https://apps.apple.com/us/app/fitspire-5-minute-workout/id6758784938
2•devavinoth12•8h ago•0 comments

Show HN: I built a RAG engine to search Singaporean laws

https://github.com/adityaprasad-sudo/Explore-Singapore
4•ambitious_potat•8h ago•4 comments

Show HN: Sem – Semantic diffs and patches for Git

https://ataraxy-labs.github.io/sem/
2•rs545837•9h ago•1 comments

Show HN: Falcon's Eye (isometric NetHack) running in the browser via WebAssembly

https://rahuljaguste.github.io/Nethack_Falcons_Eye/
4•rahuljaguste•14h ago•1 comments

Show HN: Local task classifier and dispatcher on RTX 3080

https://github.com/resilientworkflowsentinel/resilient-workflow-sentinel
25•Shubham_Amb•1d ago•2 comments

Show HN: FastLog: 1.4 GB/s text file analyzer with AVX2 SIMD

https://github.com/AGDNoob/FastLog
5•AGDNoob•11h ago•1 comments

Show HN: A password system with no database, no sync, and nothing to breach

https://bastion-enclave.vercel.app
12•KevinChasse•20h ago•16 comments

Show HN: Gohpts tproxy with arp spoofing and sniffing got a new update

https://github.com/shadowy-pycoder/go-http-proxy-to-socks
2•shadowy-pycoder•11h ago•0 comments

Show HN: GitClaw – An AI assistant that runs in GitHub Actions

https://github.com/SawyerHood/gitclaw
9•sawyerjhood•20h ago•0 comments

Show HN: I built a directory of $1M+ in free credits for startups

https://startupperks.directory
4•osmansiddique•12h ago•0 comments

Show HN: A Kubernetes Operator to Validate Jupyter Notebooks in MLOps

https://github.com/tosin2013/jupyter-notebook-validator-operator
2•takinosh•12h ago•0 comments
Open in hackernews

Show HN: Ocrbase – pdf → .md/.json document OCR and structured extraction API

https://github.com/majcheradam/ocrbase
99•adammajcher•2w ago

Comments

mechazawa•2w ago
Is only bun supported or also regular node?
adammajcher•2w ago
it's bun first because of performance
mechazawa•2w ago
performance for a tool like this isn't really a huge priority imho. Libraries should have compatibility as a priority over performance unless it's the stated goal.
hersko•2w ago
I have a flow where i extract text from a pdf with pdf-parse and then feed that to an ai for data extraction. If that fails i convert it to a png and send the image for data extraction. This works very well and would presumably be far cheaper as i'm generally sending text to the model instead of relying on images. Isn't just sending the images for ocr significantly more expensive?
mimim1mi•2w ago
By definition, OCR means optical character recognition. It depends on the contents of the PDF what kind of extraction methodology can work. Often some available PDFs are just scans of printed documents or handwritten notes. If machine readable text is available your approach is great.
trollbridge•2w ago
I always render an image and OCR that so I don’t get odd problems from invisible text and it also avoids being affected by anything for SEO.
saaaaaam•2w ago
There was an interesting discussion on here a couple of months back about images vs text, driven by this article: https://www.seangoedecke.com/text-tokens-as-image-tokens/

Discussion is here: https://news.ycombinator.com/item?id=45652952

unrahul•2w ago
I have seen this flow in what people in some startups call "Agentic OCR", its essentially a control flow that is coded that tries pdf-parse first or a similar non expensive approach, and if it fails a threshold then use screenshot to text extraction.
sgc•2w ago
How does this compare to dots.ocr? I got fantastic results when I tested dots.

https://github.com/rednote-hilab/dots.ocr

mjrpes•2w ago
Ocrbase is CUDA only while dots.ocr uses vLLM, so should support ROCm/AMD cards?
actionfromafar•2w ago
How about CPU?
jasonni•2w ago
dots.ocr requires requires a considerable amount of computational resources. If you have Mac device with ARM CPU(M series), you can try my dots.ocr.runner(https://github.com/jason-ni/app.dots.ocr.runner).

There is a pipeline solution with multiple small specific models that can run only with CPU: https://github.com/RapidAI/RapidOCR

sgc•2w ago
Jason, your runner looks interesting. I am using debian linux on my laptop with an intel cpu and nvidia gpu (proprietary nvidia cuda drivers). Should I be able to get it working? What is your speed per page at this point? Thank you
v3ss0n•2w ago
How this is better over Surya/Marker or kreuzberg https://github.com/kreuzberg-dev/kreuzberg.
jadbox•2w ago
Sounds like someone needs to run their own test cases and report back on which solution does a better job...
kspacewalk2•2w ago
Let me fire up Claude code.
sixtyj•2w ago
Let me fire up Tesseract.

https://github.com/tesseract-ocr

Jimmc414•2w ago
I fought with Tesseract for quite a while. Its good if high accuracy doesn't matter. Transcribing a book from clean, consistent non-skewed data its fine and an LLM might even be able to clean it up. But for legal or accounting data from hand scanned documents, the error rate made it untenable. Even clean, scanned documents of the same category have all sorts of density and skew anomalies that get misinterpreted. You'll pull your hair out trying to account for edge cases and never get the results you need even with numerous adjustments and model retraining on errors.

Flash 2.5 or 3 with thinking gave the best results.

sixtyj•2w ago
Thanks. I was surprised that Tesseract had recognized poorly scanned magazines and with some Python library I was able to transcribe two-columns layout with almost no errors.

Tesseract is a cheap solution as it doesn’t touch any LLM.

For invoices, Gemini flash is really good, for sure, and you receive “sorted” data as well. So definitely thumbs up. I use it for transcription of difficult magazine layout.

I think that for such legally problematic usage as companies don’t like to share financial data with Google, it is be better to use a local model.

Ollama or HuggingFace has a lot of them.

v3ss0n•2w ago
Surya is a lot better in that.
sync•2w ago
This is essentially a (vibe-coded?) wrapper around PaddleOCR: https://github.com/PaddlePaddle/PaddleOCR

The "guts" are here: https://github.com/majcheradam/ocrbase/blob/7706ef79493c47e8...

Oras•2w ago
Claude is included in the contributors, so the OP didn’t hide it
Tiberium•2w ago
At this point it feels like HN is becoming more like Reddit, most people upvote before actually checking the repo.
M4R5H4LL•2w ago
Most production software is wrappers around existing libraries. The relevant question is whether this wrapper adds operational or usability value, not whether it reimplements OCR. If there are architectural or reliability concerns, it’d be more useful to call those out directly.
tuwtuwtuwtuw•2w ago
Sure. The self host guide tells me to enter my github secret, in plain-text, in an env file. But it doesn't tell me why I should do that.

Do people actually store their secrets in plain text on the file system in production environments? Just seems a bit wild to me.

adammajcher•2w ago
well, you can use secrets manager as well
constantinum•2w ago
What matters most is how well OCR and structured data extraction tools handle documents with high variation at production scale. In real workflows like accounting, every invoice, purchase order, or contract can look different. The extraction system must still work reliably across these variations with minimal ongoing tweaks.

Equally important is how easily you can build a human-in-the-loop review layer on top of the tool. This is needed not only to improve accuracy, but also for compliance—especially in regulated industries like insurance.

Other tools in this space:

LLMWhisperer/Unstract(AGPL)

Reducto

Extend Ai

LLamaparse

Docling

cess11•2w ago
Why is 12GB+ VRAM a requirement? The OCR model looks kind of small, https://huggingface.co/PaddlePaddle/PaddleOCR-VL/tree/main, so I'm assuming it is some processing afterwards it would be used for.
adammajcher•2w ago
fixed
cess11•2w ago
OK, thanks, so it runs on a couple GB of CUDA?
binalpatel•2w ago
This is admittedly dated but even back in December 2023 GPT-4 with it's Vision preview was able to very reliably do structured extraction, and I'd imagine Gemini 3 Flash is much better than back then.

https://binal.pub/2023/12/structured-ocr-with-gpt-vision/

Back of the napkin math (which I could be messing up completely) but I think you could process a 100 page PDF for ~$0.50 or less using Gemini 3 Flash?

>560 input tokens per page * 100 pages = 56000 tokens = $0.028 input ($0.5/m input tokens) >~1000 output tokens per page * 100 pages = $0.30 output ($3/m output tokens)

(https://ai.google.dev/gemini-api/docs/gemini-3#media_resolut...)

adammajcher•2w ago
sure, in some small projects I recommend my friends to use gemini 3 flash. ocrbase is aimed more at scale and self-hosting: fixed infra cost, high throughput, and no data leaving your environment. at large volumes, that tradeoff starts to matter more than per-100-page pricing
fmirkowski•2w ago
having worked with paddleocr, tesseract and many other ocr tools before this is still one of the best and smoothest ocr experiences ive ever had, deployed in minutes
prats226•2w ago
Instead of markdown -> LLM to get JSON, you can just train a slightly bigger model which you can constrain decode to give JSON rightaway. https://huggingface.co/nanonets/Nanonets-OCR2-3B

We recently published a cookbook for constrained decoding here: https://nanonets.com/cookbooks/structured-llm-outputs/

woocash99•2w ago
Awesome idea!
woocash99•2w ago
Very useful!