frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Undatas.io – A pay-on-accept document parsing API

https://undatas.io/
1•jojogh•1d ago
Hey HN, Alex here, founder of undatas.io.

Our journey started from a place of deep frustration with RAG (Retrieval-Augmented Generation). I was helping companies build internal knowledge bases on their own data, and the promise was huge. But in practice, the results were often mediocre. Important information was frequently missed during retrieval, and we kept hitting dead ends.

After endless debugging, we realized the problem wasn't the LLM; it was classic "garbage in, garbage out." We traced the retrieval failures back to the very first step: document parsing.

Whether we used open-source libraries or expensive paid APIs, the story was the same. Precision was lost. Key phrases, critical numbers, and entire table rows would just vanish during the parsing process. We spent countless hours manually comparing the original PDFs to the parsed output to find what went wrong. It was a soul-crushing, time-consuming nightmare.

The biggest pain points were:

1. Complex Tables: Most tools collapsed when faced with real-world documents. Borderless tables, cells merged across rows and columns, or tables containing handwritten notes were consistently mangled.

2. Lack of a Feedback Loop: When the parser got something wrong, there was no easy way to manually annotate and correct it. You were stuck with the bad output.

I got so frustrated that I decided to build the tool I wished I had: a parsing engine obsessed with precision, that makes the entire data extraction process transparent. That’s what undatas.io is. And today, we're launching our API.

We built our API around a simple principle: you only pay for results you actually accept.

To solve the transparency problem, every piece of extracted data in the JSON response includes its positional coordinates (bbox). This allows you to build your own "glass box" validator, mapping the data directly back to the source document, making the data prep stage for RAG completely transparent.

Our goal is to build the best and most trustworthy parsing tool for developers. We're just getting started and would be grateful for your feedback.

You can check out the docs and try it out here: https://doc.undatas.io/

I’ll be here all day to answer any questions. Let me know what you think.

Show HN: Tally, but with Lead Attribution

https://informs.io/
1•hankor•5m ago•0 comments

What if you could invest in a person?

https://www.figma.com/deck/cL0Oo6YFNrmNH9jqGDBVH2/RootNet?node-id=1-82&t=SbFLkJf1lxJFINDu-1&scali...
1•koopuluri•7m ago•0 comments

Marathon Fusion claims it can turn mercury into gold while creating clean energy

https://phys.org/news/2025-07-marathon-fusion-mercury-gold-energy.html
1•gurjeet•8m ago•0 comments

Show HN: CLI constraints as types via parser combinators in TypeScript

https://optique.dev/why
1•dahlia•9m ago•0 comments

Staying ahead in the age of AI: a leadership guide [pdf]

https://cdn.openai.com/pdf/ae250928-4029-4f26-9e23-afac1fcee14c/staying-ahead-in-the-age-of-ai.pdf
1•OJFord•12m ago•0 comments

China Weighs Curbs on Stock Speculation to Foster Steady Gains

https://www.bloomberg.com/news/articles/2025-09-04/china-weighs-curbs-on-stock-speculation-to-fos...
1•theconomist•15m ago•0 comments

Sweeteners could accelerate cognitive decline

https://www.neurology.org/doi/10.1212/WNL.0000000000214023
2•mounram•24m ago•0 comments

Consumer-pgmq – Dead letter queue new feature

1•tiagorosadacost•33m ago•0 comments

Bypass Paywalls Clean (private) is restricted for violating Mozilla policies

https://addons.mozilla.org/en-US/firefox/blocked-addon/magnolia@12.34/4.0.8.3/
4•linksbro•34m ago•1 comments

Sheaf theoretic formulation for consciousness (2017)

https://pubmed.ncbi.nlm.nih.gov/28887144/
1•kelseyfrog•36m ago•0 comments

Linux Kernel SMB 0-Day Vulnerability CVE-2025-37899 Uncovered Using ChatGPT O3

https://www.upwind.io/feed/linux-kernel-smb-0-day-vulnerability-cve-2025-37899-uncovered-using-ch...
2•todsacerdoti•39m ago•0 comments

Simple but Powerful Pratt Parsing

https://matklad.github.io/2020/04/13/simple-but-powerful-pratt-parsing.html
1•thunderbong•45m ago•0 comments

Why Rewriting Emacs Is Hard

https://kyo.iroiro.party/en/posts/why-rewriting-emacs-is-hard/
2•signa11•55m ago•1 comments

Lumo: The least open 'open' AI assistant

https://osai-index.eu/news/lumo-proton-least-open/
3•mac-attack•58m ago•0 comments

Man Found Dead at Burning Man, Prompting Homicide Investigation

https://www.nytimes.com/2025/08/31/us/burning-man-festival-dead.html
1•gnabgib•1h ago•0 comments

The Rust Innovation Lab

https://rustfoundation.org/rust-innovation-lab/
1•pabs3•1h ago•0 comments

Disintegration Fingerprinting: A low-cost, easy tool to identify fake medicines

https://www.medrxiv.org/content/10.1101/2025.08.15.25333621v1
1•nativeit•1h ago•0 comments

Brainstorm -OR- Green Needle [video]

https://www.youtube.com/watch?v=1okD66RmktA
1•baxtr•1h ago•0 comments

100M CROWPOWER and no horses on the moon

https://taylor.town/crowpower
3•jbrr•1h ago•0 comments

Why DOGE's Luke Farritor Followed Elon Musk to DC

https://www.bloomberg.com/features/2025-luke-farritor-doge/
3•nxobject•1h ago•0 comments

AI Backed Sports Betting Analysis

https://www.aicalledit.com
2•rk3000•1h ago•0 comments

The Suicide State

https://brooklynrail.org/2025/09/field-notes/the-suicide-state/
3•mackeye•1h ago•0 comments

Anthropic, Meta, and Snap are paying up to 350k+ base for a DevRel

https://www.devreljob.com/
3•npmipg•1h ago•1 comments

DaCe AD: Unifying High-Performance Automatic Differentiation for ML and SciComp

https://arxiv.org/abs/2509.02197
2•matt_d•1h ago•0 comments

Show HN: Mock PSP API – simulate payments and webhooks

2•d_sai•1h ago•0 comments

One mother for two species via obligate cross-species cloning in ants

https://www.nature.com/articles/s41586-025-09425-w
1•vagabund•1h ago•0 comments

Motion Canvas

https://motioncanvas.io/
2•cyanf•1h ago•0 comments

Commentary: Prepare to say a frond farewell to Los Angeles' palm trees

https://www.latimes.com/california/story/2025-08-10/prepare-to-say-farewell-to-los-angeles-palm-t...
1•PaulHoule•1h ago•1 comments

Show HN: V0.dev-like version selector for Nano Banana image editor

https://edit0.com
2•Justin3go•1h ago•0 comments

Venezuela's president thinks American spies can't hack Huawei phones

https://techcrunch.com/2025/09/03/venezuelas-president-thinks-american-spies-cant-hack-huawei-pho...
4•rguiscard•1h ago•4 comments