frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Ocrbase – pdf → .md/.json document OCR and structured extraction API

https://github.com/majcheradam/ocrbase
35•adammajcher•2h ago

Comments

mechazawa•1h ago
Is only bun supported or also regular node?
hersko•1h ago
I have a flow where i extract text from a pdf with pdf-parse and then feed that to an ai for data extraction. If that fails i convert it to a png and send the image for data extraction. This works very well and would presumably be far cheaper as i'm generally sending text to the model instead of relying on images. Isn't just sending the images for ocr significantly more expensive?
mimim1mi•1h ago
By definition, OCR means optical character recognition. It depends on the contents of the PDF what kind of extraction methodology can work. Often some available PDFs are just scans of printed documents or handwritten notes. If machine readable text is available your approach is great.
trollbridge•23m ago
I always render an image and OCR that so I don’t get odd problems from invisible text and it also avoids being affected by anything for SEO.
saaaaaam•20m ago
There was an interesting discussion on here a couple of months back about images vs text, driven by this article: https://www.seangoedecke.com/text-tokens-as-image-tokens/

Discussion is here: https://news.ycombinator.com/item?id=45652952

sgc•54m ago
How does this compare to dots.ocr? I got fantastic results when I tested dots.

https://github.com/rednote-hilab/dots.ocr

mjrpes•35m ago
Ocrbase is CUDA only while dots.ocr uses vLLM, so should support ROCm/AMD cards?
v3ss0n•30m ago
How this is better over Surya/Marker or kreuzberg https://github.com/kreuzberg-dev/kreuzberg.

IP Addresses Through 2025

https://www.potaroo.net/ispcol/2026-01/addr2025.html
68•petercooper•1h ago•21 comments

Danish pension fund divesting US Treasuries

https://www.reuters.com/business/danish-pension-fund-divest-its-us-treasuries-2026-01-20/
91•mythical_39•38m ago•44 comments

The Zen of Reticulum

https://github.com/markqvist/Reticulum/blob/master/Zen%20of%20Reticulum.md
36•mikece•2h ago•16 comments

I'm addicted to being useful

https://www.seangoedecke.com/addicted-to-being-useful/
187•swah•5h ago•119 comments

Show HN: Ocrbase – pdf → .md/.json document OCR and structured extraction API

https://github.com/majcheradam/ocrbase
35•adammajcher•2h ago•8 comments

Linux kernel framework for PCIe device emulation, in userspace

https://github.com/cakehonolulu/pciem
139•71bw•7h ago•45 comments

Running Claude Code dangerously (safely)

https://blog.emilburzo.com/2026/01/running-claude-code-dangerously-safely/
155•emilburzo•3h ago•123 comments

Level S4 solar radiation event

https://www.swpc.noaa.gov/news/g4-severe-geomagnetic-storm-levels-reached-19-jan-2026
531•WorldPeas•19h ago•174 comments

IP over Avian Carriers with Quality of Service (1999)

https://www.rfc-editor.org/rfc/rfc2549.html
31•mig4ng•4h ago•15 comments

Reticulum, a secure and anonymous mesh networking stack

https://github.com/markqvist/Reticulum
277•brogu•15h ago•64 comments

Apple testing new App Store design that blurs the line between ads and results

https://9to5mac.com/2026/01/16/iphone-apple-app-store-search-results-ads-new-design/
498•ksec•23h ago•400 comments

Channel3 (YC S25) Is Hiring

https://www.ycombinator.com/companies/channel3/jobs/3DIAYYY-backend-engineer
1•aschiff1•3h ago

Increasing the performance of WebAssembly Text Format parser by 350%

https://blog.gplane.win/posts/improve-wat-parser-perf.html
64•gplane•5d ago•28 comments

What came first: the CNAME or the A record?

https://blog.cloudflare.com/cname-a-record-order-dns-standards/
408•linolevan•22h ago•144 comments

Benchmarking a Baseline Fully-in-Place Functional Language Compiler [pdf]

https://trendsfp.github.io/papers/tfp26-paper-12.pdf
14•matt_d•4d ago•3 comments

Nanolang: A tiny experimental language designed to be targeted by coding LLMs

https://github.com/jordanhubbard/nanolang
192•Scramblejams•18h ago•150 comments

The Overcomplexity of the Shadcn Radio Button

https://paulmakeswebsites.com/writing/shadcn-radio-button/
426•dbushell•8h ago•239 comments

The coming industrialisation of exploit generation with LLMs

https://sean.heelan.io/2026/01/18/on-the-coming-industrialisation-of-exploit-generation-with-llms/
198•long•1d ago•127 comments

3D printing my laptop ergonomic setup

https://www.ntietz.com/blog/3d-printing-my-laptop-ergonomic-setup/
114•kurinikku•16h ago•34 comments

Scaling long-running autonomous coding

https://simonwillison.net/2026/Jan/19/scaling-long-running-autonomous-coding/
141•srameshc•15h ago•65 comments

Notes on Apple's Nano Texture (2025)

https://jon.bo/posts/nano-texture/
218•dsr12•21h ago•118 comments

x86 prefixes and escape opcodes flowchart

https://soc.me/interfaces/x86-prefixes-and-escape-opcodes-flowchart.html
81•gaul•12h ago•31 comments

Prediction markets are ushering in a world in which news becomes about gambling

https://www.theatlantic.com/technology/2026/01/america-polymarket-disaster/685662/
373•krustyburger•1d ago•368 comments

Giving university exams in the age of chatbots

https://ploum.net/2026-01-19-exam-with-chatbots.html
186•ploum•8h ago•135 comments

Nuudel: Non-Tracking Appointment Tool

https://nuudel.digitalcourage.de/
10•doener•4d ago•3 comments

Targeted Bets: An alternative approach to the job hunt

https://www.seanmuirhead.com/blog/targeted-bets
101•seany62•18h ago•87 comments

British redcoat's lost memoir reveals realities of life as a disabled veteran

https://phys.org/news/2026-01-british-redcoat-lost-memoir-reveals.html
109•wglb•4d ago•113 comments

Europe could 'weaponize' $10T of US assets over Greenland

https://www.bloomberg.com/news/articles/2026-01-19/-weaponizing-10-trillion-of-us-assets-is-tough...
25•saubeidl•1h ago•8 comments

Kiss Launcher – fast launcher for Android

https://kisslauncher.com/
19•ifh-hn•5h ago•9 comments

Porsche sold more electrified cars in Europe in 2025 than pure gas-powered cars

https://newsroom.porsche.com/en/2026/company/porsche-deliveries-2025-41516.html
411•m463•14h ago•585 comments