frontpage.
newsnewestaskshowjobs

Open Source @Github

fp.

Open in hackernews

Launch HN: Parsewise (YC P25) – Reason Across Documents with an API

14•gergelycsegzi•1h ago
Hi all, it’s Greg and Max, founders of Parsewise here

Parsewise transforms a bucket of unstructured data into schema compliant data retaining lineage for values resolved across documents. Imagine giving Claude a bunch of files and asking for a CSV or JSON output. If you have tried this, you know both the system limitations (number of files, type of inputs, cost, latency) but also the human-facing challenge of having no way to validate the results quickly. We solve both. We help tech teams simplify their unstructured data ETL, and loop in business experts for the definitions and for instant validation.

Here is a video with a few use cases: https://www.youtube.com/watch?v=dbRllnnh47w

Parsewise in the words of someone coming to us: ”I need to extract information from insurance policy PDFs, phone calls that have been transcribed, emails, etc. I am NOT looking for something that would just extract data point by data point, page by page into a structured well-defined schema but more something more agentic that can understand that information might be across documents and that it should reason over what to extract.”

We started the company based on a decade of experience (and pain) in complex data transformation and data analysis / synthesis. Greg was building both classical ETL and implemented AI workflows at Palantir. At Bain, Max did highly complex data analysis in the financial sector, similar to many of our customers.

Parsewise works by taking in a bucket of data (think hundreds or thousands of pdfs, excels etc.), and outputting schema compliant data where every single value is traceable down to word level citations across multiple documents in the bucket. We provide API customers with ways to show the lineage in their own applications, or they can use our platform for internal operations. At the core of the data processing we have self-improving agent definitions. They define the acceptable sources, the logic for resolving or combining values, and the rule for highlighting uncertainty to the end user.

The underlying tech is model and cloud agnostic and can be deployed in private networks. We have seen the best results with Gemini models for visual reasoning, achieving SOTA (beating Claude Fable) on the strongest grounded reasoning benchmark we have found (Databricks OfficeQA). Notably, we focused more on the “human harness” rather than the model harness, leaning into the actual friction we saw in uptake, which is around verifiability. That means optimizing the time and clicks required to trust the outcomes. We use vLLMs for parsing, and then we use small models for efficient large scale exhaustive search. Unlike RAG, we do not sample; instead, we exhaustively find all relevant values for a given query. We use larger models for decision making around resolutions and flagging inconsistencies to users.

This exhaustiveness and explicit value sourcing is unique to our platform, and it goes beyond the first step of data parsing that many existing providers cover.

We would love to welcome builders and tinkerers to try Parsewise on your complex document challenges. We have a ton of ideas on how we can expand the product and make it better, but would appreciate feedback and ideas from the community!

Comments

gergelycsegzi•1h ago
Ah probably should add a link to our website: https://www.parsewise.ai/api
stevesimmons•26m ago
"retaining lineage"
gergelycsegzi•16m ago
"That is a great catch!"
gnerd00•18m ago
> implemented AI workflows at Palantir

you show this in the first paragraph, before many other details

> We would love to welcome builders and tinkerers

Love? really .. cognitive dissonance here.. I read this as " we are security state friendly so we can get that big security state funding" plus "people who work for free like love, so we say that word"

coupled with the free-riding of VC capital on decades of open work, I just can not, not say this

gergelycsegzi•9m ago
I learnt a lot at Palantir, though always worked in commercial so no ties to security state (for the better or worse). (Also side-note, we are working towards enabling frontier performance with smaller open models that allows our customers to protect their data. https://www.parsewise.ai/officeqa-sota )

And I do get genuine joy from helping our users, so love it is:)

gorgmah•10m ago
I worked recently on an internal tool to achieve this kind of things, mostly plugging mistral OCR to gemini to extract structured data from documents. We then perform automated diffs too.

There seems to be an insane amount of competition in the "Intelligent Document Processing" market, like for instance parseur, whose founder is often on HN himself.

What do you think sets you apart from competition like : 1) Mistral document AI : depending on the model, it looks way cheaper than yours, OCR model pricing ranges from 0.001 to 0.004 EUR / page and they have structured output wired in the OCR API if needed (things then get fed to one of their LLMs) + EU-based and GDPR ready 2) parseur / rossum / docsumo / nanonets (which is YC 2017) ?

Why I Stopped Arguing with People

https://wangcong.org/2026-06-30-why-i-stopped-arguing-with-people.html
304•backlit4034•1h ago•231 comments

Asahi Linux 7.1 Progress Report

https://asahilinux.org/2026/06/progress-report-7-1/
368•pantalaimon•4h ago•101 comments

Single Dose of Frog-Derived Gut Bacterium Eradicates 100% of Tumors in Mice

https://www.thefocalpoints.com/p/new-study-frog-derived-gut-bacterium
290•mpweiher•5h ago•152 comments

For First Time, a Cell Built from Scratch Grows and Divides

https://www.quantamagazine.org/for-the-first-time-a-cell-built-from-scratch-grows-and-divides-202...
28•defrost•42m ago•1 comments

Launch HN: Parsewise (YC P25) – Reason Across Documents with an API

14•gergelycsegzi•1h ago•6 comments

Announcing Box3D :: Box2D

https://box2d.org/posts/2026/06/announcing-box3d/
53•makepanic•2h ago•5 comments

Nintendo has raised its employees base salary by 10%

https://mynintendonews.com/2026/06/26/nintendo-has-raised-its-employees-base-salary-by-10/
245•_tk_•3h ago•111 comments

Newly discovered spider builds spring loaded snare to catch ants

https://phys.org/news/2026-06-newly-australian-ballista-spider-snare.html
172•chimpanzee•2d ago•32 comments

Manufact (YC S25) Is Hiring a Developer Advocate in SF

https://www.ycombinator.com/companies/manufact/jobs/4cyWd6S-developer-advocate-partnerships-devrel
1•luigipederzani•1h ago

Red Programming Language: Static linking support

https://www.red-lang.org/2026/06/static-linking-support.html
11•em-bee•1d ago•1 comments

Your Kids' School Bus Is About to Become a Roaming Surveillance Vehicle

https://www.thedrive.com/news/your-kids-school-bus-is-about-to-become-a-roaming-surveillance-vehicle
39•cf100clunk•1h ago•5 comments

Obfuscation: Building the final boss of cryptography (Part I)

https://vitalik.eth.limo/general/2026/06/29/obfuscation1.html
51•fbrusch•1d ago•4 comments

Claude Sonnet 5

https://www.anthropic.com/news/claude-sonnet-5
1201•marinesebastian•21h ago•738 comments

Compiler-Assisted Floating-Point Error Analysis and Profiling with FPChecker

https://fpanalysistools.org/ISC26/
18•matt_d•1d ago•2 comments

Godot will no longer accept AI-authored code contributions

https://www.pcgamer.com/gaming-industry/open-source-game-engine-godot-will-no-longer-accept-ai-au...
436•pjmlp•7h ago•280 comments

Sony Deletes 551 Movies PlayStation Owners Paid For

https://reclaimthenet.org/sony-deletes-551-studiocanal-movies-playstation-owners-paid-for
27•bilsbie•36m ago•6 comments

ArXiv's Next Chapter

https://blog.arxiv.org/2026/06/30/arxivs-next-chapter/
203•subset•12h ago•63 comments

The Internet I Grew Up with Doesn't Exist Anymore

https://cleberg.net/blog/internet.html
192•felixdoerp•4h ago•169 comments

A deep dive into SmallVector:push_back

https://maskray.me/blog/2026-06-27-a-deep-dive-into-smallvector-push-back
21•mariuz•1d ago•5 comments

Google copybara: moving code between repositories

https://github.com/google/copybara
264•reconnecting•15h ago•51 comments

Monetization Gateway

https://blog.cloudflare.com/monetization-gateway/
9•soheilpro•1h ago•2 comments

Claude Code is steganographically marking requests

https://thereallo.dev/blog/claude-code-prompt-steganography
2287•kirushik•23h ago•682 comments

Claude Science

https://claude.com/product/claude-science
534•lebovic•21h ago•152 comments

Swedish court says Google is to pay $1.5B to Klarna in antitrust damages

https://www.reuters.com/business/swedish-court-says-google-is-pay-15-billion-klarna-antitrust-dam...
105•giuliomagnifico•3h ago•78 comments

Leanstral 1.5

https://docs.mistral.ai/models/model-cards/leanstral-1-5-26-06
279•vetronauta•18h ago•121 comments

Nano Banana 2 Lite

https://deepmind.google/models/gemini-image/flash-lite/
417•minimaxir•22h ago•172 comments

Show HN: Frond – a frontend runtime for your app's dependency graph

https://frondruntime.dev
7•romanonthego•2h ago•8 comments

This Cell Feeds, Grows and Reproduces. and It's Manmade

https://www.nytimes.com/interactive/2026/07/01/science/spudcells-synthetic-cell.html
16•quux•56m ago•2 comments

How does a pull-back car work? Illustrated teardown

https://mechanical-pencil.com/products/car
251•Muhammad523•2d ago•39 comments

CERN bids farewell to the LHC and enters Long Shutdown 3

https://home.cern/cern-bids-farewell-to-the-lhc-and-enters-long-shutdown-3/
298•HelloUsername•1d ago•94 comments