frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

GLM-OCR: Accurate × Fast × Comprehensive

https://github.com/zai-org/GLM-OCR
65•ms7892•4d ago

Comments

aliljet•1h ago
This is actually the thing I really desperately need. I'm routinely analyzing contracts that were faxed to me, scanned with monstrously poor resolution, wet signed, all kinds of shit. The big LLM providers choke on this raw input and I burn up the entire context window for 30 pages of text. Understandable evals of the quality of these OCR systems (which are moving wicked fast) would be helpful...

And here's the kicker. I can't afford mistakes. Missing a single character or misinterpreting it could be catastrophic. 4 units vacant? 10 days to respond? Signature missing? Incredibly critical things. I can't find an eval that gives me confidence around this.

cinntaile•1h ago
Deciphering fax messages? What is this, the 90s?
daveguy•1h ago
If your needs are that sensitive, I doubt you'll find anything anytime soon that doesn't require a human in the loop. Even SOTA models only average 95% accuracy on messy inputs. If that's a per character accuracy (which OCR is generally measured by), that's going to be 5+ errors per page of 100+ words. If you really can't afford mistakes you have to consider the OCR inaccurate. If you have key components like "days to respond" and "units vacant" you need to identify the presence of those specifically with bias in favor of false positives (over false negatives), and human confirmation of the source-> OCR.
coder543•58m ago
If you want OCR with the big LLM providers, you should probably be passing one page per request. Having the model focus on OCR for only a single page at a time seemed to help a lot in my anecdotal testing a few months ago. You can even pass all the pages in parallel in separate requests, and get the better quality response much faster too.

But, as others said, if you can't afford mistakes, then you're going to need a human in the loop to take responsibility.

HPsquared•19m ago
You could maybe then do a second pass on the whole text (as plain text not OCR) to look for likely mistakes.
chrsw•39m ago
I'm keeping my eye on progress in this area as well. I need to free engineering design data from tens of thousands of PDF pages and make them easily and quickly accessible to LLMs.
aliljet•35m ago
All of healthcare is crying. Trust me.
Imustaskforhelp•31m ago
I suppose tears of joy?
fragmede•6m ago
Of sadness because they're not allowed to use it yet.
yieldcrv•17m ago
> I burn up the entire context window for 30 pages of text

We analyze 200 page contracts no problem

I think you're doing it wrong or in an antiquated way (until context window sizes improve)

Are you doing this programmatically at all or are you doing something closer to dropping a contract into a chat window?

We use a main agent to classify the pages and we build subagents that are familiar with page classifications and are fed page ranges. They all have their own full context window and prompts

coder543•1h ago
There are a bunch of new OCR models.

I’ve also heard very good things about these two in particular:

- LightOnOCR-2-1B: https://huggingface.co/lightonai/LightOnOCR-2-1B

- PaddleOCR-VL-1.5: https://huggingface.co/PaddlePaddle/PaddleOCR-VL-1.5

The OCR leaderboards I’ve seen leave a lot to be desired.

With the rapid release of so many of these models, I wish there were a better way to know which ones are actually the best.

I also feel like most/all of these models don’t handle charts, other than to maybe include a link to a cropped image. It would be nice for the OCR model to also convert charts into markdown tables, but this is obviously challenging.

StableAlkyne•13m ago
How do these compare to something like Tesseract?

I remember that one clearing the scoreboard for many years, and usually it's the one I grab for OCR needs due to its reputation.

rdos•31m ago
Is it possible for such a small model to outperform gemini 3 or is this a case of benchmarks not showing the reality? I would love to be hopeful, but so far an open source model was never better than a closed one even when benchmarks were showing that.
amluto•28m ago
Off the top of my head: for a lot of OCR tasks, it’s kind of worse for the model to be smart. I don’t want my OCR to make stuff up or answer questions — I want to to recognize what is actually on the page.
rdos•17m ago
Interesting. Won't stuff like entity extraction suffer? Especially in multilingual use cases. My worry is that a smaller model might not realize some text is actually a persons name because it is very unusual.
alaanor•11m ago
There was so many OCR models released in the past few months, all VLM models and yet none of them handle Korean well. Every time I try with a random screenshot (not a A4 document) they just fail at a "simple" task. And funnily enough Qwen3 8B VL is the best model that usually get it right (although I couldn't get the bbox quite well). Even more funny, whatever is running on an iphone locally on cpu is insanely good, same with google's OCR api. I don't know why we don't get more of the traditional OCR stuff. Paddlepaddle v5 is the closest I could find. At this point, I feel like I might be doing something wrong with those VLMs.

GLM-OCR: Accurate × Fast × Comprehensive

https://github.com/zai-org/GLM-OCR
67•ms7892•4d ago•17 comments

Railway (PaaS) Global Outage

https://status.railway.com
25•TealMyEal•52m ago•16 comments

It's all a blur

https://lcamtuf.substack.com/p/its-all-a-blur
226•zdw•5d ago•45 comments

Show HN: AI agents play SimCity through a REST API

https://hallucinatingsplines.com
86•aed•2d ago•27 comments

WiFi Could Become an Invisible Mass Surveillance System

https://scitechdaily.com/researchers-warn-wifi-could-become-an-invisible-mass-surveillance-system/
79•mgh2•4d ago•34 comments

Why Vampires Live Forever

https://machielreyneke.com/blog/vampires-longevity/
9•machielrey•53m ago•2 comments

FAA Halts All Flights at El Paso Airport for 10 Days

https://www.nytimes.com/2026/02/11/us/faa-el-paso-flight-restrictions.html
183•edward•7h ago•371 comments

NanoClaw solves one of OpenClaw's biggest security issues

https://venturebeat.com/orchestration/nanoclaw-solves-one-of-openclaws-biggest-security-issues-an...
22•marsh_mellow•26m ago•5 comments

Exposure Simulator

http://www.andersenimages.com/tutorials/exposure-simulator/
76•sneela•5h ago•28 comments

Rome is studded with cannon balls (2022)

https://essenceofrome.com/rome-is-studded-with-cannon-balls
48•thomassmith65•4d ago•3 comments

Show HN: Renovate – The Kubernetes-Native Way

https://github.com/mogenius/renovate-operator
18•JanLepsky•2h ago•10 comments

Communities are not fungible

https://www.joanwestenberg.com/communities-are-not-fungible/
125•tardibear•9h ago•60 comments

Chrome extensions spying on users' browsing data

https://qcontinuum.substack.com/p/spying-chrome-extensions-287-extensions-495
355•qcontinuum1•6h ago•146 comments

GLM5 Released on Z.ai Platform

https://chat.z.ai/
159•CuriouslyC•3h ago•142 comments

The Day the Telnet Died

https://www.labs.greynoise.io/grimoire/2026-02-10-telnet-falls-silent/
433•pjf•18h ago•316 comments

Lessons you will learn living in a snowy place

https://eukaryotewritesblog.com/2026/01/21/very-snowy-place/
221•surprisetalk•5d ago•195 comments

Ask HN: Why electronics are still so unrecyclable?

15•alexandrehtrb•1h ago•25 comments

Windows Notepad App Remote Code Execution Vulnerability

https://www.cve.org/CVERecord?id=CVE-2026-20841
614•riffraff•10h ago•375 comments

The Feynman Lectures on Physics (1961-1964)

https://www.feynmanlectures.caltech.edu/
413•rramadass•1d ago•107 comments

A Cosmic Miracle: A Remarkably Luminous Galaxy at z=14.44 Confirmed with JWST

https://astro.theoj.org/article/156033-a-cosmic-miracle-a-remarkably-luminous-galaxy-at-_z_-sub-s...
67•yread•7h ago•33 comments

Visualize MySQL query execution plans as interactive FlameGraphs

https://github.com/vgrippa/myflames
42•tanelpoder•4d ago•7 comments

CoLoop (YC S21) Is Hiring Ex Technical Founders in London

https://www.workatastartup.com/jobs/90016
1•mrlowlevel•9h ago

The Singularity will occur on a Tuesday

https://campedersen.com/singularity
1255•ecto•23h ago•681 comments

End of an era for me: no more self-hosted git

https://www.kraxel.org/blog/2026/01/thank-you-ai/
142•dzulp0d•14h ago•100 comments

Do not apologize for replying late to my email

https://ploum.net/2026-02-11-do_not_apologize_for_replying_to_my_email.html
145•validatori•6h ago•127 comments

Ex-GitHub CEO launches a new developer platform for AI agents

https://entire.io/blog/hello-entire-world/
572•meetpateltech•1d ago•539 comments

AI-First Company Memos

https://the-ai-native.company/
61•bobismyuncle•1h ago•75 comments

Exploring a Modern SMTPE 2110 Broadcast Truck

https://www.jeffgeerling.com/blog/2026/exploring-a-modern-smpte-2110-broadcast-truck-with-my-dad/
129•assimpleaspossi•3d ago•31 comments

Clean-room implementation of Half-Life 2 on the Quake 1 engine

https://code.idtech.space/fn/hl2
413•klaussilveira•1d ago•84 comments

Both GCC and Clang generate strange/inefficient code

https://codingmarginalia.blogspot.com/2026/02/both-gcc-and-clang-generate.html
59•rsf•4d ago•20 comments