frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

The Napster (2000)

https://time.com/archive/6954963/meet-the-napster/
1•thomassmith65•8m ago•0 comments

A Great Way to Snub the World (1981)

https://time.com/archive/6697378/living-a-great-way-to-snub-the-world/
1•thomassmith65•8m ago•0 comments

UniFi OS Server for MSPs

https://blog.ui.com/article/introducing-unifi-os-server
1•doener•11m ago•0 comments

Before Sebald Was Great

https://www.thenation.com/article/culture/wg-sebald-silent-catastrophes/
1•Caiero•12m ago•0 comments

Poet, Artist, Tantric Christian

https://thecritic.co.uk/poet-artist-tantric-christian/
1•lermontov•19m ago•0 comments

What Happened to AltaVista? The Rise and Fall of a Search Pioneer

https://em360tech.com/tech-articles/what-happened-altavista-rise-and-fall-search-pioneer
3•CharlesW•21m ago•1 comments

Show HN: Freezewell – A Private Egg Freezing Tracker (Offline App)

https://onionwave7.gumroad.com/l/xddyzd
1•kian_sage•21m ago•0 comments

The Louder the Monkey, the Smaller Its Balls, Study Finds

https://www.vice.com/en/article/the-louder-the-monkey-the-smaller-its-balls-study-finds-42361364663309/
2•CharlesW•22m ago•2 comments

Show HN: MindSafe Journal – An Offline Mental Health Privacy Journal

https://onionwave7.gumroad.com/l/MindSafe
1•kian_sage•23m ago•0 comments

Calibre-Web-Automated

https://github.com/crocodilestick/Calibre-Web-Automated
1•ValentineC•27m ago•0 comments

Double Pendulums are (not) Chaotic

https://www.youtube.com/watch?v=dtjb2OhEQcU
1•leidenfrost•38m ago•0 comments

Op-ed: Donor Organs Are Too Rare. We Need a New Definition of Death

https://www.nytimes.com/2025/07/30/opinion/organ-donors-death-definition.html
3•johntfella•41m ago•0 comments

Universal Orland EV buses caught fire

https://wdwnt.com/2025/07/breaking-two-new-electric-universal-epic-universe-buses-destroyed-by-fire-just-outside-park/
1•burnt-resistor•44m ago•1 comments

Why cold feels good: Scientists uncover the chill pathway

https://www.sciencedaily.com/releases/2025/07/250730030354.htm
2•freedomben•44m ago•0 comments

Fed up with both traditional and AI search

1•zyruh•46m ago•4 comments

A record-breaking baby has been born from an embryo that's over 30 years old

https://www.technologyreview.com/2025/07/29/1120769/exclusive-record-breaking-baby-born-embryo-over-30-years-old/
1•gscott•50m ago•0 comments

ChatGPT Confessions gone? They are not

https://www.digitaldigging.org/p/chatgpt-confessions-gone-they-are
2•tzury•54m ago•1 comments

Why open-source AI became an American National Priority

https://venturebeat.com/ai/why-open-source-ai-became-an-american-national-priority/
1•briggiesmallz•56m ago•0 comments

Separated men are nearly 5x more likely to take their lives than married men

https://medicalxpress.com/news/2025-07-men.html
3•PaulHoule•57m ago•0 comments

Windows 10 10: How Microsoft led developers round in circles

https://www.theregister.com/2025/08/01/windows_10_dev_comment/
3•RachelF•1h ago•0 comments

Speak, Don't Type

https://www.typeless.com
1•lhuser123•1h ago•0 comments

Hashcat v7.0.0 Released

https://github.com/hashcat/hashcat/releases/tag/v7.0.0
1•GalaxySnail•1h ago•0 comments

ChatGPT scrubbed today nearly 50k shared conversations from Google

https://twitter.com/henkvaness/status/1951252284953763844/photo/1
2•taytus•1h ago•0 comments

It's not you, it's their bullshit

https://brilliantcrank.com/its-not-you-its-their-bullshit/
2•donutshop•1h ago•0 comments

The Emacs dumper dispute (2016)

https://lwn.net/Articles/707615/
1•aragonite•1h ago•0 comments

The Great Crime Paradox

https://www.ft.com/content/7488fe4c-5e1d-4b2b-adab-f42ad5273fc9
1•paulpauper•1h ago•0 comments

What does it mean for AI to be sovereign–and does that come before AGI?

1•trendinghotai•1h ago•1 comments

Therac-25

https://en.wikipedia.org/wiki/Therac-25
6•aragonite•1h ago•1 comments

Robert Wilson has died

https://www.theartnewspaper.com/2025/08/01/robert-wilson-playwright-director-artist-obituary
16•paulpauper•1h ago•2 comments

Lots of thriving life, 30k feet deep

https://www.washingtonpost.com/climate-environment/2025/07/30/deep-sea-discovery-pacific-ocean/
1•paulpauper•1h ago•1 comments
Open in hackernews

Show HN: An API to extract structured data from any document without training

https://ninjadoc.ai
2•dbvitapps•14h ago
Hey HN,

I'm the founder of Ninjadoc AI. I've spent years working with document processing, and I've always been frustrated by the existing solutions for structured data extraction.

The core problem is that most tools force you into one of two bad options:

Template-based extractors: You define fixed regions or rules. These are incredibly brittle and break the moment a document layout changes slightly (e.g., a new invoice template from a vendor). ML-based extractors: These require you to gather hundreds (sometimes thousands) of your own labeled documents to train a custom model for each document type. It's a slow, expensive, and data-intensive process. I wanted a "zero-shot" solution that worked out of the box, so I built Ninjadoc AI.

Our approach is different. Instead of training, you use a tool to define your desired schema once. For example, you define fields like invoice_id, due_date, and line_items. The AI then uses this schema to understand the document's structure and context, allowing it to extract the correct data from any layout variation of that document type. It's layout-agnostic.

A few key technical features:

It's a REST API: Simple to integrate, returns structured JSON. Bounding Box Coordinates: For every piece of extracted data, the API returns its precise coordinates on the document. This is useful for building verification UIs or for record-keeping. To my knowledge, we're the only zero-shot tool that provides this. Visual Schema Builder: No code is needed to define what you want to extract. You just upload one example document and map fields visually. Those rules then apply universally. No Training/No Templates: It works immediately on your documents without any model fine-tuning or sample uploads. The goal is to provide a powerful, developer-friendly API that skips the most painful parts of document data extraction.

I'd be grateful for any feedback, especially on the API design and the overall developer experience.

You can try it out here: https://ninjadoc.ai

There's a free plan with 5,000 credits (no credit card required), which is enough to run a few hundred pages through it.

Thanks for checking it out!