frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

https://github.com/valdanylchuk/breezydemo
222•isitcontent•13h ago•25 comments

Show HN: I spent 4 years building a UI design tool with only the features I use

https://vecti.com
323•vecti•15h ago•142 comments

Show HN: If you lose your memory, how to regain access to your computer?

https://eljojo.github.io/rememory/
275•eljojo•16h ago•165 comments

Show HN: I built a free UCP checker – see if AI agents can find your store

https://ucphub.ai/ucp-store-check/
2•vladeta•1h ago•1 comments

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

https://github.com/phreda4/r3
70•phreda4•12h ago•14 comments

Show HN: Smooth CLI – Token-efficient browser for AI agents

https://docs.smooth.sh/cli/overview
90•antves•1d ago•66 comments

Show HN: ARM64 Android Dev Kit

https://github.com/denuoweb/ARM64-ADK
16•denuoweb•1d ago•2 comments

Show HN: Compile-Time Vibe Coding

https://github.com/Michael-JB/vibecode
10•michaelchicory•2h ago•1 comments

Show HN: Slack CLI for Agents

https://github.com/stablyai/agent-slack
47•nwparker•1d ago•11 comments

Show HN: Artifact Keeper – Open-Source Artifactory/Nexus Alternative in Rust

https://github.com/artifact-keeper
150•bsgeraci•1d ago•63 comments

Show HN: Gigacode – Use OpenCode's UI with Claude Code/Codex/Amp

https://github.com/rivet-dev/sandbox-agent/tree/main/gigacode
17•NathanFlurry•21h ago•7 comments

Show HN: Slop News – HN front page now, but it's all slop

https://dosaygo-studio.github.io/hn-front-page-2035/slop-news
8•keepamovin•3h ago•2 comments

Show HN: Fitspire – a simple 5-minute workout app for busy people (iOS)

https://apps.apple.com/us/app/fitspire-5-minute-workout/id6758784938
2•devavinoth12•6h ago•0 comments

Show HN: Horizons – OSS agent execution engine

https://github.com/synth-laboratories/Horizons
23•JoshPurtell•1d ago•5 comments

Show HN: Daily-updated database of malicious browser extensions

https://github.com/toborrm9/malicious_extension_sentry
14•toborrm9•18h ago•7 comments

Show HN: I built a RAG engine to search Singaporean laws

https://github.com/adityaprasad-sudo/Explore-Singapore
4•ambitious_potat•6h ago•4 comments

Show HN: Micropolis/SimCity Clone in Emacs Lisp

https://github.com/vkazanov/elcity
172•vkazanov•2d ago•49 comments

Show HN: Sem – Semantic diffs and patches for Git

https://ataraxy-labs.github.io/sem/
2•rs545837•7h ago•1 comments

Show HN: BioTradingArena – Benchmark for LLMs to predict biotech stock movements

https://www.biotradingarena.com/hn
25•dchu17•17h ago•12 comments

Show HN: Falcon's Eye (isometric NetHack) running in the browser via WebAssembly

https://rahuljaguste.github.io/Nethack_Falcons_Eye/
4•rahuljaguste•12h ago•1 comments

Show HN: FastLog: 1.4 GB/s text file analyzer with AVX2 SIMD

https://github.com/AGDNoob/FastLog
5•AGDNoob•9h ago•1 comments

Show HN: Local task classifier and dispatcher on RTX 3080

https://github.com/resilientworkflowsentinel/resilient-workflow-sentinel
25•Shubham_Amb•1d ago•2 comments

Show HN: Gohpts tproxy with arp spoofing and sniffing got a new update

https://github.com/shadowy-pycoder/go-http-proxy-to-socks
2•shadowy-pycoder•10h ago•0 comments

Show HN: I built a directory of $1M+ in free credits for startups

https://startupperks.directory
4•osmansiddique•10h ago•0 comments

Show HN: A password system with no database, no sync, and nothing to breach

https://bastion-enclave.vercel.app
11•KevinChasse•18h ago•16 comments

Show HN: A Kubernetes Operator to Validate Jupyter Notebooks in MLOps

https://github.com/tosin2013/jupyter-notebook-validator-operator
2•takinosh•10h ago•0 comments

Show HN: GitClaw – An AI assistant that runs in GitHub Actions

https://github.com/SawyerHood/gitclaw
9•sawyerjhood•18h ago•0 comments

Show HN: 33rpm – A vinyl screensaver for macOS that syncs to your music

https://33rpm.noonpacific.com/
3•kaniksu•12h ago•0 comments

Show HN: Chiptune Tracker

https://chiptunes.netlify.app
3•iamdan•12h ago•1 comments

Show HN: Craftplan – I built my wife a production management tool for her bakery

https://github.com/puemos/craftplan
568•deofoo•5d ago•166 comments
Open in hackernews

Show HN: OCR Arena – A playground for OCR models

https://www.ocrarena.ai/battle
216•kbyatnal•2mo ago
I built OCR Arena as a free playground for the community to compare leading foundation VLMs and open-source OCR models side-by-side.

Upload any doc, measure accuracy, and (optionally) vote for the models on a public leaderboard.

It currently has Gemini 3, dots.ocr, DeepSeek, GPT5, olmOCR 2, Qwen, and a few others. If there's any others you'd like included, let me know!

Comments

dang•2mo ago
[under-the-rug stub]

[see https://news.ycombinator.com/item?id=45988611 for explanation]

ylhert•2mo ago
We've got like 10 LLM arenas but nothing for OCR yet, really hope this takes off!
profburial•2mo ago
This is a killer idea!
athoscouto•2mo ago
Nice! Would love to see Azure Document Intelligence on this
arathis•2mo ago
Claude would be good!
kbyatnal•2mo ago
Claude coming shortly (in the next ~1 hour)
rubikscubeguy•2mo ago
claude is live now!
ianhawes•2mo ago
Please add Chandra by Datalab
fzysingularity•2mo ago
FYI one of the models on the battle was pretty slow to load. Are these also being rated on latency or just quality?
andrewlu0•2mo ago
ideally we want people to rate based on quality - but i imagine some of the results are biased rn based on loading time
hdjrudni•2mo ago
That's an easy fix if you wait for the slowest one and pop them both in at the same time, no?
kbyatnal•2mo ago
Ultimately, there’s some intersection of accuracy x cost x speed that’s ideal, which can be different per use case. We’ll surface all of those metrics shortly so that you can pick the best model for the job along those axes.
zzleeper•2mo ago
Love this! Would have liked to see something like textract for a pre-LLM benchmark (but of course that's expensive), and also a distinction between handwritten text and printed one.

But still, this is incredibly useful!

krashidov•2mo ago
I would be curious to see how Sonnet does. Their models are pretty solid when it comes to PDFs
kbyatnal•2mo ago
Sonnet/Opus is being added shortly!
rubikscubeguy•2mo ago
sonnet and opus are live now :)
ArcaneMoose•2mo ago
I've been really impressed with this model specifically because of how insanely cheap it is: https://replicate.com/ibm-granite/granite-vision-3.3-2b

I didn't expect IBM to be making relevant AI models but this thing is priced at $1 per 4,000,000 output tokens... I'm using it to transcribe handwritten input text and it works very well and super fast.

irjustin•2mo ago
Thanks for this! Will test this model out because we do a lot of in between steps to get around the output token limits.

Super nice if it worked for our use case to simply get full output.

rubikscubeguy•2mo ago
I'm the dev who made this:) We are looking into adding granite!
nicman23•2mo ago
English only :( . it seems only 2 orders of magnitude larger models have support for ie greek :/
intalentive•2mo ago
IBM and Nvidia speech to text models are also SOTA (according to HF leaderboard) and relatively lightweight. Replicate hosts those too, although some (like Parakeet) run easily on consumer GPU.
codeddesign•2mo ago
Most of these are general LLM’s and not specifically OCR models. Where is Google Vision, Mistral, Paddle, Nanonets, or Chandra??
kbyatnal•2mo ago
We wanted to keep the focus on (1) foundation VLMs and (2) open source OCR models.

We had Mistral previously but had to remove it because their hosted API for OCR was super unstable and returned a lot of garbage results unfortunately.

Paddle, Nanonets, and Chandra being added shortly!

timbmg•2mo ago
MistralOCR works stably for me when first uploading the file to their server and then running the OCR. I also had some issues before when giving a URL directly to the OCR API, not sure if you're doing that?
rubikscubeguy•2mo ago
nanonets is live now!
wener•2mo ago
Really hope there is a layout mode or ocr with bbox mode, I want to see the model restore the whole page.
rubikscubeguy•2mo ago
yeah, that would be a cool long term goal
cdrini•2mo ago
There have been such a large number of OCR tools pop up over the past ~year; sorely in need for some benchmarks to compare them. Would love to see support for normal OCR tools like tesseract, EasyOCR, Microsoft Azure, etc. I'm using these for some projects, and my experiments with VLMs for OCR have resulted in too much hallucination for me to switch. Benchmarks comparing across this aisle would be incredibly useful.
daemonologist•2mo ago
A limitation of this leaderboard approach that I want to point out is that while the large general-purpose LLMs can make greater leaps of inference (on handwriting and poor quality scans), and almost always produce better layouts and more coherent output, they can also sometimes be less correct. My experience is that they're more prone to skipping or transposing sections of text, or even hallucinating completely incorrect output, than the purpose-trained models. (A similar comparison can be made in turn to the character- or word-based OCR approaches like Tesseract, which are even less "intelligent" but also even less prone to those malbehaviors.)

Also, some of the models are prone to infinite loops and I suspect this is not being punished appropriately; the frontend seems to get into a bad state after around 50k characters, which prevents the user from selecting a winner. Probably would be beneficial to make sure every model has an output length limit.

Still, a really cool resource - I'm looking forward to more models being added.

rubikscubeguy•2mo ago
Totally agree w/ your first point! For the looping, we just added a stop condition for now in battle mode, and you can still vote on the other model afterwards. A bit of a hard problem to solve. We will add more models!
hakunin•2mo ago
Would be great to compare these against Apple’s LiveText. This project now supports it: https://github.com/mkyt/OCRmyPDF-AppleOCR

I’ve had great results locally. Albeit you need macOS >=13 for this.

prodigycorp•2mo ago
This needs a "both are bad" button. There are some generations where I cannot rightfully beats the other.
deaux•2mo ago
I suggest you make explicit the assumption that this website is specifically about English text. Otherwise the leaderboard is pretty meaningless, with extreme differences in performance across other scripts - and potentially even languages such as Vietnamese or Czech which use Latin but have lots of accents.
hdjrudni•2mo ago
That's unfortunate because I have a bunch of photos with handwritten German on the back that I need to transcribe, and seeing as that I can't read German I can't really do it by myself either.
deaux•2mo ago
I reckon performance on German will be similar to English, the only real difference is the umlauts and those are very consistent. Not sure how it will do on the ß.
nicman23•2mo ago
qwen 3.5 vl instruct on openrouter is damn cheap - and works quite well with non english stuff.

i have it verify some stamps which are quite messy and sometimes obscured and honestly some i could not even read.

maverwa•2mo ago
from my first tests it does fine with german, at least for the gastly "handwritten" font the restaurant menu I used for the test uses.
rubikscubeguy•2mo ago
Hey! I'm the dev who made this:) I think that you are right, data will bias towards english because we have a dataset that people can use that is in english. But you can also upload non-english docs into the battle mode as well as the playground!
skissane•2mo ago
LMArena splits their leaderboard by language: maybe you should consider doing the same thing

I assume to do that you’d need another model to do language detection on the inputs and/or outputs; but a language detection model can be a lot cheaper than an OCR model or an LLM

ComputerGuru•2mo ago
Two suggestions:

UX on mobile isn’t great. It wasn’t obvious to me where the second model output was and I was thrown off even more so because the option to vote for model 1 output was presented without ever even seeing model two output.

Second suggestion would be to install a MathJax plugin so one can properly rate mathematical equations and formulas. Raw LATeX is easy to mistake and it makes comparing between LATeX and Unicode outputs hard.

rubikscubeguy•2mo ago
Hey! Dev who made this here. I hear you on the mobile UX, it's on my docket of things to fix. Same with math plugin! Thanks for the suggestions.
mkolodny•2mo ago
This is super helpful :) Curious about Grok as well!
rubikscubeguy•2mo ago
Hello! Dev who made this here. Working on adding grok.
coulix•2mo ago
We need to see Landing.ai DPT-2, from my tests its the best in term of ability to extract structure from complex tables so far.
ajmurmann•2mo ago
Really like the idea. Unfortunately, my first upload is still spinning on one of the models about 5 minutes in. Clicking "Stop Battle" seems to do nothing either
rubikscubeguy•2mo ago
Hey, I'm the dev who built this! Looking into it. Wondering if it's because of load due to this post.
est•2mo ago
Offtopic, but what's the best OCR that can run offline on browsers with js/wasm with reasonable CPU/memory cost?

Working on a hobby project that interacts with user handwriting on <canvas>. Tried some CNN models for digits but had trouble with characters.

yorwba•2mo ago
If the text is written interactively on the canvas (as opposed to extraction from pixels) this task is known as "online handwriting recognition" ("online" because you can watch the text being formed incrementally, which makes it easier to e.g. distinguish individual strokes.)

I don't know what the state of the art is, but an old model for digitizer pens might not do so bad either.

tensor•2mo ago
Probably a wasm port of tesseract. E.g https://robertknight.github.io/tesseract-wasm/

Note that I haven't tried any of them, but tesseract is still likely the leading open source OCR that works with CPU.

aixpert•2mo ago
Opus is multimodal??
densekernel•2mo ago
Any plans to add Document Pre-trained transformer-2 (DPT-2) from https://landing.ai/?
tarruda•2mo ago
Interesting that the 8B of the Qwen3-VL family 9th place, above a few proprietary models. This thing can run locally with llama.cpp on modest hardware.
molf•2mo ago
What is needed to evaluate OCR for most business applications (above everything else) is accuracy.

Some results look plausible but are just plain wrong. That is worse than useless.

Example: the "Table" sample document contains chemical substances and their properties. How many numbers did the LLM output and associate correctly? That is all that matters. There is no "preference" aspect that is relevant until the data is correct. Nicely formatted incorrect data is still incorrect.

I reviewed the output from Qwen3-VL-8B on this document. It mixes up the rows, resulting in many values associated with the wrong substance. I presume using its output for any real purpose would be incredibly dangerous. This model should not be used for such a purpose. There is no winning aspect to it. Does another model produce worse results? Then both models should be avoided at all costs.

Are there models available that are accurate enough for this purpose? I don't know. It is very time consuming to evaluate. This particular table seems pretty legible. A real production grade OCR solution should probably need a 100% score on this example before it can be adopted. The output of such a table is not something humans are good at reviewing. It is difficult to spot errors. It either needs to be entirely correct, or the OCR has failed completely.

I am confident we'll reach a point where a mix of traditional OCR and LLM models can produce correct and usable output. I would welcome a benchmark where (objective) correctness is rated separately from of the (subjective) output structure.

Edit: Just checked a few other models for errors on this example.

* GPT 5.1 is confused by the column labelled "C4" and mismatches the last 4 columns entirely. And almost all of the numbers in the last column are wrong.

* olmOCR 2 omits the single value in column "C4" from the table.

* Gemini 3 produces "1.001E-04" instead of "1.001E-11" as viscosity at T_max for Argon. Off by 7 orders of magnitude! There is zero ambiguity in the original table. On the second try it got it right. Which is interesting! I want to see this in a benchmark!

There might be more errors! I don't know, I'd like to see them!

fzysingularity•2mo ago
This is why arenas are generally a bad idea for assessing correctness in visual tasks.
poulpy123•2mo ago
I'm very impressed by the models, to the point I was wondering if they were really converting the pdf or just reading the content. I tried on documents in french, english and spanish, very heaving on graphics and with complex layouts (boardgame, flyer, book about rust), and I wasn't expecting anything great. Especially some models were showing symbols and smileys quite close from the original.

I noticed that some models were resisting better to faking data than other, especially I saw that in a sentence cut from the document, GPT5 was inventing the end of the sentence and opus was properly showing it cut.

I didn't try with my writing but in the playground there is one example and some models read it better than me.

I wish the output would show the confidence of the model on each part. I think it would help immensely.

Note that sometimes a model get stuck in a loop, preventing to vote and to see which model is which

deeptishukla22•2mo ago
Is there a way I can invoke this programatically?
tethys•2mo ago
> If there's any others you'd like included, let me know!

Just this morning I came across HunyuanOCR which sounded very promising. https://huggingface.co/tencent/HunyuanOCR

timbmg•2mo ago
Would be great to add MistralOCR!
dahateb•2mo ago
I can second that, super cheap with 1$ per 1000 pages
gfody•2mo ago
plz add https://huggingface.co/spaces/lixin4ever/VideoLLaMA3-Image
lokl•2mo ago
Please compare with FineReader.
vdm•2mo ago
cool UI and lets anyone upload a doc. but lacks https://github.com/opendatalab/mineru
mpercy123•2mo ago
i don't think i'm you're target audience but i found it interesting to see the side-by-side comparisons from images with text in. it's pretty cool to see how different models interpret photos, too. cool tool, must've been fun to make.