frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Claude 4

https://www.anthropic.com/news/claude-4
407•meetpateltech•58m ago•159 comments

That fractal that's been up on my wall for 12 years

https://chriskw.xyz/2025/05/21/Fractal/
106•chriskw•1h ago•10 comments

Mozilla to shutdown Pocket on July 8, 2025

https://support.mozilla.org/en-US/kb/future-of-pocket
58•phantomathkg•1h ago•24 comments

Launch HN: WorkDone (YC X25) – AI Audit of Medical Charts

29•digitaltzar•2h ago•19 comments

Improving performance of rav1d video decoder

https://ohadravid.github.io/posts/2025-05-rav1d-faster/
201•todsacerdoti•5h ago•63 comments

Fast Allocations in Ruby 3.5

https://railsatscale.com/2025-05-21-fast-allocations-in-ruby-3-5/
99•tekknolagi•3h ago•25 comments

Show HN: DockFlow – Switch between multiple macOS Dock layouts instantly

https://dockflow.appitstudio.com/
15•pugdogdev•32m ago•1 comments

Show HN: SQLite JavaScript - extend your database with JavaScript

https://github.com/sqliteai/sqlite-js
95•marcobambini•4h ago•32 comments

A South Korean grand master on the art of the perfect soy sauce

https://www.theguardian.com/world/2025/may/21/without-time-there-is-no-flavour-a-south-korean-grand-master-on-the-art-of-the-perfect-soy-sauce
54•n1b0m•1d ago•12 comments

Planetfall

https://somethingaboutmaps.wordpress.com/2025/05/20/planetfall/
233•milliams•8h ago•50 comments

Adventures in Symbolic Algebra with Model Context Protocol

https://www.stephendiehl.com/posts/computer_algebra_mcp/
58•freediver•3h ago•10 comments

Gemini Diffusion

https://simonwillison.net/2025/May/21/gemini-diffusion/
772•mdp2021•16h ago•203 comments

Social media platforms: what's wrong, and what's next

https://www.scottgoci.com/social-media-platforms-whats-wrong-and-whats-next/
33•eggbrain•2h ago•36 comments

The scientific “unit” we call the decibel

https://lcamtuf.substack.com/p/decibels-are-ridiculous
494•Ariarule•13h ago•390 comments

I Built My Own Audio Player

https://nexo.sh/posts/why-i-built-a-native-mp3-player-in-swiftui/
63•nexo-v1•3h ago•44 comments

Show HN: Whenish – Plan Group Events in iMessages

https://apps.apple.com/us/app/whenish/id6745035749
24•devgoth•2h ago•28 comments

Ice Theft in Antarctica

https://nautil.us/ice-theft-in-antarctica-1210083/
9•simonebrunozzi•1h ago•3 comments

Warning Signs Your App Authorization Is a Ticking Time Bomb

https://www.osohq.com/post/app-authorization-warning-signs
5•meghan•30m ago•1 comments

Benchmarking Crimes Meet Formal Verification

https://microkerneldude.org/2025/04/27/benchmarking-crimes-meet-formal-verification/
8•snvzz•3d ago•0 comments

Four years of sight reading practice

https://sandrock.co.za/carl/2025/05/four-years-of-sight-reading-pracice/
100•chthonicdaemon•3d ago•44 comments

MCP explained without hype or fluff

https://blog.nilenso.com/blog/2025/05/12/mcp-explained-without-hype-or-fluff/
61•captn3m0•1h ago•28 comments

Near-infrared spatiotemporal color vision enabled by upconversion contact lenses

https://www.cell.com/cell/fulltext/S0092-8674(25)00454-4
19•ArnoVW•2h ago•12 comments

Show HN: Curved Space Shader in Three.js (via 4D sphere projection)

https://github.com/bntre/CurvedSpaceShader
45•bntr•6h ago•16 comments

The Philosophy of Byung-Chul Han (2020)

https://newintrigue.com/2020/06/29/the-philosophy-of-byung-chul-han/
34•-__---____-ZXyw•4h ago•2 comments

Everything’s a bug (or an issue)

https://www.bozemanpass.com/everythings-a-bug-or-an-issue/
41•dboreham•3d ago•20 comments

U.S. Spy Agencies–One-Stop Shop to Buy Your Personal Data

https://theintercept.com/2025/05/22/intel-agencies-buying-data-portal-privacy/
80•LAsteNERD•2h ago•30 comments

Inigo Quilez: computer graphics, mathematics, shaders, fractals, demoscene

https://iquilezles.org/articles/
264•federicoponzi•4d ago•28 comments

Free-Threaded Python Library Compatibility Checker

https://ft-checker.com/
21•lifthrasiir•4h ago•4 comments

Strengths and limitations of diffusion language models

https://www.seangoedecke.com/limitations-of-text-diffusion-models/
40•rbanffy•7h ago•2 comments

Display any CSV file as a searchable, filterable, pretty HTML table

https://github.com/derekeder/csv-to-html-table
226•indigodaddy•17h ago•59 comments
Open in hackernews

How we made our OCR code more accurate

https://pieces.app/blog/how-we-made-our-optical-character-recognition-ocr-code-more-accurate
39•thunderbong•1d ago

Comments

camtarn•6h ago
Neat article, but I feel like I have no idea why they're doing this! Is transcribing code from images really such a big use case?
lelag•5h ago
Maybe they want to compile the Apollo Guidance Computer source code...

https://www.softwareheritage.org/wp-content/uploads/2019/07/...

ivanjermakov•2h ago
If it's not a joke, I think it was already digitized: https://github.com/chrislgarry/Apollo-11
dewey•5h ago
> To best support software engineers when they want to transcribe code from images, we fine-tuned our pre-processing pipeline to screenshots of code in IDEs, terminals, and online resources like YouTube videos and blog posts.

Even with these examples that seems like a very narrow use case.

FloatArtifact•5h ago
From an accessibility standpoint, yes. To be able to pattern match where you are in I.D.E without using an accessibility api
SloopJon•4h ago
The product appears to be similar to Microsoft's embattled Recall feature. In order to remember your digital life it takes frequent screenshots.
gosub100•3h ago
I guess it would be excellent to evade security monitors to take unauthorized copies of your employers codebase.
EvanAnderson•1h ago
It worries me that stuff like that becoming easier will lead to wacky data pipelines being normalized (pulling display output off systems and "scraping" it to get data, of dubious quality, versus just building a proper interface). The kind of crowd that likes "low code" tools like MSFT's "Power Automate" is going to love to make Rube Goldberg nightmares out of tools like this.

It fills me with a deep sadness that we created deterministic machines then, though laziness, exploit every opportunity to "contaminate" them with sloppy practices that make them produce output with the same fuzzy inaccuracy as human brains.

Old man yells a neural networks take: We're entering a "The Machine Stops" era where nobody is going to know how to formulate basic algorithms.

"We need to add some numbers. Let's point a camera at the input, OCR it, then feed it to an LLM that 'knows math'. Then we don't have to figure out an algorithm to add numbers."

I wish compute "cost" more so people would be forced to actually make efficient use of hardware. Sadly, I think it'll take mass societal and infrastructure collapse for that to happen. Until it does, though, let the excess compute flow freely!

jocoda•1h ago
asimov - The feeling of power.
abc-1•5h ago
Anything that mentions tesseract is about 10 years out of date at this point.
amelius•5h ago
Well, at least I can apt-get install tesseract.

That doesn't hold for any of the GPU-based solutions, last time I checked.

booder1•5h ago
5.5.0 released November last year. Still a very active project as far as I can tell and runs on CPU. Even compared to best open source GPU option it is still pretty good. VLMs work very differently and don't work as well for everything. Why is it out of date?
cbsmith•59m ago
I don't know that that is true: https://researchify.io/blog/comparing-pytesseract-paddleocr-...

Using Surya gets you significantly better results and makes almost all the work detailed in the article largely unnecessary.

krapht•4h ago
I just built a pipeline with tesseract last year. What's better that is open source and runnable locally?

VLLM hallucination is a blocker for my use case.

stavros•4h ago
How is a hallucination worse than a Tesseract error?
jgalt212•4h ago
Hallucinations are hard to detect unless you are a subject-matter expert. I don't have direct experience with Tesseract error detection.
krapht•4h ago
Because the VLM doesn't know it hallucinated. When you get a Tesseract error you can flag the OCR job for manual review.
amelius•3h ago
It could hallucinate obscene language, something which is less likely with classic OCR.
gessha•3h ago
Latter is more likely to get debugged.
criddell•3h ago
If you are stuck with open source, then your options are limited.

Otherwise I'd say just use your operating system's OCR API. Both Windows and MacOS have excellent APIs for this.

fxtentacle•4h ago
Quite simply, you’re completely wrong. Modern tesseract versions include a modern LSTM AI. It can very affordably be deployed on CPU, yet its performance is competitive with much more expensive large GPU-based models. Especially if you handle a high volume of scans, chances are that tesseract will have the best bang per buck.
nicman23•3h ago
i remember that you could not train it your self in a font like you could in older versions, it that still the case?
ianhawes•2h ago
My company probably spent close to 6 figures overall creating Tesseract 5 custom models for various languages. Surya beats them all and is open source (and quite faster).
bluelightning2k•4h ago
I can't say I've ever wanted to transcribe code from an image. That seems super niche.

Perhaps the specific idea is to harvest coding textbooks as training data for LLMs?

eurekin•3h ago
I'm guessing to automatically scrape videos for future training rounds.
blharr•1h ago
Eh, imagine poor documentation where people take screenshots of steps and don't write them out.

I can also imagine plenty of YouTube tutorials that type the code live... seems fairly useful

vaxman•4h ago
Tesseract OCR was created by digital (DEC) in 19_8_5 (yes, 40 not four YEARs ago). Now go back and read the article and ROFL with me.
Onavo•3h ago
The original tesseract OCR has no neural nets. It bare little resemblance to the modern version.
vaxman•3h ago
It's still 40.

Why not use Ollama-OCR?

krapht•1h ago
Because I benchmarked both on my dataset and found that Tesseract was better for my use-case?
rafram•1h ago
I’ve tested a bunch of vision models on particularly difficult documents (handwritten in a German script that’s no longer used), and I have yet to be impressed. They’re good at BSing to the point that you almost think they nailed it, until you realize that it’s mostly/all made-up text that doesn’t appear in the document.
yjftsjthsd-h•40m ago
> It's still 40.

Is it, though? If the important parts of the code are new, does it matter that other parts are older or derived from older code? (Of course, I think this whole line of thought is pointless; what matters is not age, but how well it works, and tesseract generally does seem to work.)

ivanjermakov•2h ago
What is this argument? Much software we use today was created in the 80s.
rafram•1h ago
Unix was created in _1971_ and here we are still running processes and shells like it’s the 70s. Why not just have an LLM dream up the output?
sushid•1h ago
Making OCR more accurate for regular text (e.g. data extraction from documents) would be useful; not sure how useful code transcription is
bobosha•1h ago
has anyone tried feeding the admittedly noisy OCR-ed text -at a document level - to an LLM for making sense? Presumably some of the less capable ones should be quite affordable and accurate at scale as well.
lesuorac•58m ago
OCR is the biggest XY problem.

Stop accepting PDFs and force things to use APIs ...