frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Undatas.io – A pay-on-accept document parsing API

https://undatas.io/
1•jojogh•5mo ago
Hey HN, Alex here, founder of undatas.io.

Our journey started from a place of deep frustration with RAG (Retrieval-Augmented Generation). I was helping companies build internal knowledge bases on their own data, and the promise was huge. But in practice, the results were often mediocre. Important information was frequently missed during retrieval, and we kept hitting dead ends.

After endless debugging, we realized the problem wasn't the LLM; it was classic "garbage in, garbage out." We traced the retrieval failures back to the very first step: document parsing.

Whether we used open-source libraries or expensive paid APIs, the story was the same. Precision was lost. Key phrases, critical numbers, and entire table rows would just vanish during the parsing process. We spent countless hours manually comparing the original PDFs to the parsed output to find what went wrong. It was a soul-crushing, time-consuming nightmare.

The biggest pain points were:

1. Complex Tables: Most tools collapsed when faced with real-world documents. Borderless tables, cells merged across rows and columns, or tables containing handwritten notes were consistently mangled.

2. Lack of a Feedback Loop: When the parser got something wrong, there was no easy way to manually annotate and correct it. You were stuck with the bad output.

I got so frustrated that I decided to build the tool I wished I had: a parsing engine obsessed with precision, that makes the entire data extraction process transparent. That’s what undatas.io is. And today, we're launching our API.

We built our API around a simple principle: you only pay for results you actually accept.

To solve the transparency problem, every piece of extracted data in the JSON response includes its positional coordinates (bbox). This allows you to build your own "glass box" validator, mapping the data directly back to the source document, making the data prep stage for RAG completely transparent.

Our goal is to build the best and most trustworthy parsing tool for developers. We're just getting started and would be grateful for your feedback.

You can check out the docs and try it out here: https://doc.undatas.io/

I’ll be here all day to answer any questions. Let me know what you think.

Golden Cross vs. Death Cross: Crypto Trading Guide

https://chartscout.io/golden-cross-vs-death-cross-crypto-trading-guide
1•chartscout•1m ago•0 comments

Hoot: Scheme on WebAssembly

https://www.spritely.institute/hoot/
2•AlexeyBrin•4m ago•0 comments

What the longevity experts don't tell you

https://machielreyneke.com/blog/longevity-lessons/
1•machielrey•5m ago•0 comments

Monzo wrongly denied refunds to fraud and scam victims

https://www.theguardian.com/money/2026/feb/07/monzo-natwest-hsbc-refunds-fraud-scam-fos-ombudsman
2•tablets•10m ago•0 comments

They were drawn to Korea with dreams of K-pop stardom – but then let down

https://www.bbc.com/news/articles/cvgnq9rwyqno
2•breve•12m ago•0 comments

Show HN: AI-Powered Merchant Intelligence

https://nodee.co
1•jjkirsch•15m ago•0 comments

Bash parallel tasks and error handling

https://github.com/themattrix/bash-concurrent
2•pastage•15m ago•0 comments

Let's compile Quake like it's 1997

https://fabiensanglard.net/compile_like_1997/index.html
1•billiob•16m ago•0 comments

Reverse Engineering Medium.com's Editor: How Copy, Paste, and Images Work

https://app.writtte.com/read/gP0H6W5
2•birdculture•21m ago•0 comments

Go 1.22, SQLite, and Next.js: The "Boring" Back End

https://mohammedeabdelaziz.github.io/articles/go-next-pt-2
1•mohammede•27m ago•0 comments

Laibach the Whistleblowers [video]

https://www.youtube.com/watch?v=c6Mx2mxpaCY
1•KnuthIsGod•28m ago•1 comments

Slop News - HN front page right now as AI slop

https://slop-news.pages.dev/slop-news
1•keepamovin•33m ago•1 comments

Economists vs. Technologists on AI

https://ideasindevelopment.substack.com/p/economists-vs-technologists-on-ai
1•econlmics•35m ago•0 comments

Life at the Edge

https://asadk.com/p/edge
3•tosh•41m ago•0 comments

RISC-V Vector Primer

https://github.com/simplex-micro/riscv-vector-primer/blob/main/index.md
4•oxxoxoxooo•44m ago•1 comments

Show HN: Invoxo – Invoicing with automatic EU VAT for cross-border services

2•InvoxoEU•45m ago•0 comments

A Tale of Two Standards, POSIX and Win32 (2005)

https://www.samba.org/samba/news/articles/low_point/tale_two_stds_os2.html
3•goranmoomin•49m ago•0 comments

Ask HN: Is the Downfall of SaaS Started?

3•throwaw12•50m ago•0 comments

Flirt: The Native Backend

https://blog.buenzli.dev/flirt-native-backend/
2•senekor•51m ago•0 comments

OpenAI's Latest Platform Targets Enterprise Customers

https://aibusiness.com/agentic-ai/openai-s-latest-platform-targets-enterprise-customers
1•myk-e•54m ago•0 comments

Goldman Sachs taps Anthropic's Claude to automate accounting, compliance roles

https://www.cnbc.com/2026/02/06/anthropic-goldman-sachs-ai-model-accounting.html
3•myk-e•56m ago•5 comments

Ai.com bought by Crypto.com founder for $70M in biggest-ever website name deal

https://www.ft.com/content/83488628-8dfd-4060-a7b0-71b1bb012785
1•1vuio0pswjnm7•57m ago•1 comments

Big Tech's AI Push Is Costing More Than the Moon Landing

https://www.wsj.com/tech/ai/ai-spending-tech-companies-compared-02b90046
5•1vuio0pswjnm7•59m ago•0 comments

The AI boom is causing shortages everywhere else

https://www.washingtonpost.com/technology/2026/02/07/ai-spending-economy-shortages/
3•1vuio0pswjnm7•1h ago•0 comments

Suno, AI Music, and the Bad Future [video]

https://www.youtube.com/watch?v=U8dcFhF0Dlk
1•askl•1h ago•2 comments

Ask HN: How are researchers using AlphaFold in 2026?

1•jocho12•1h ago•0 comments

Running the "Reflections on Trusting Trust" Compiler

https://spawn-queue.acm.org/doi/10.1145/3786614
1•devooops•1h ago•0 comments

Watermark API – $0.01/image, 10x cheaper than Cloudinary

https://api-production-caa8.up.railway.app/docs
2•lembergs•1h ago•1 comments

Now send your marketing campaigns directly from ChatGPT

https://www.mail-o-mail.com/
1•avallark•1h ago•1 comments

Queueing Theory v2: DORA metrics, queue-of-queues, chi-alpha-beta-sigma notation

https://github.com/joelparkerhenderson/queueing-theory
1•jph•1h ago•0 comments