frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Launch HN: Pulse (YC S24) – Production-grade unstructured document extraction

29•sidmanchkanti21•3h ago
Hi HN, we’re Sid and Ritvik, co-founders of Pulse (https://www.runpulse.com/). Pulse is a document extraction system to create LLM-ready text using hybrid VLM + OCR models.

Here’s a demo video: https://video.runpulse.com/video/pulse-platform-walkthrough-....

Later in this post, you’ll find links to before-and-after examples on particularly tricky cases. Check those out to see what Pulse can really do! Modern vision language models are great at producing plausible text, but that makes them risky for OCR and data ingestion. Plausibility isn’t good enough when you need accuracy.

When we started working on document extraction, we assumed the same thing many teams do: foundation models are improving quickly, multi-modal systems appear to read documents well, what’s not to like? And indeed, for small or clean inputs, those assumptions mostly give good results. However, limitations show up once you begin processing real documents in volume. Long PDFs, dense tables, mixed layouts, low-fidelity scans, and financial or operational data expose errors that are subtle, hard to detect, and expensive to correct. Outputs look reasonable even though they contain small but important mistakes, especially in tables and numeric fields.

Running into those challenges got us working. We ran controlled evaluations on complex documents, fine tuned vision models, and built labeled datasets where ground truth actually matters. There have been many nights where our team stayed up hand-annotating pages, drawing bounding boxes around tables, labeling charts point by point, or debating whether a number was unreadable or simply poorly scanned. That process shaped our intuition far more than benchmarks.

One thing became clear quickly. The core challenge is not extraction itself, but confidence. Vision language models embed document images into high-dimensional representations optimized for semantic understanding, not precise transcription. That process is inherently lossy. When uncertainty appears, models tend to resolve it using learned priors instead of surfacing ambiguity. This behavior can be helpful in consumer settings. In production pipelines, it creates verification problems that do not scale well. Pulse grew out of our trying to address this gap through system design rather than prompting alone.

Instead of treating document understanding as a single generative step, our system separates layout analysis from language modeling. Documents are normalized into structured representations that preserve hierarchy and tables before schema mapping occurs. Extraction is constrained by schemas defined ahead of time, and extracted values are tied back to source locations so uncertainty can be inspected rather than guessed away. In practice, this results in a hybrid approach that combines traditional computer vision techniques, layout models, and vision language models, because no single approach handles these cases reliably on its own.

We are intentionally sharing a few documents that reflect the types of inputs that motivated this work. These are representative of cases where we saw generic OCR or VLM-based pipelines struggle.

Here is a financial 10K: https://platform.runpulse.com/dashboard/examples/example1

Here is a newspaper: https://platform.runpulse.com/dashboard/examples/example2

Here is a rent roll: https://platform.runpulse.com/dashboard/examples/example3

Pulse is not perfect, particularly on highly degraded scans or uncommon handwriting, and we’re working on improvements. However, our goal is not to eliminate errors entirely, but to make them visible, auditable, and easier to reason about.

Pulse is available via usage-based access to the API and platform You can sign up to try it at https://platform.runpulse.com/login. API docs are at https://docs.runpulse.com/introduction.

We’d love to hear how others here evaluate correctness for document extraction, which failure modes you have seen in practice, and what signals you rely on to decide whether an output can be trusted.

We will be around to answer questions and are happy to run additional documents if people want to share examples. Put links in the comments and we’ll plug them in and get back to you.

Looking forward to your comments!

Comments

sidcool•2h ago
Congrats on launching. Seems very interesting.
asdev•2h ago
How is this different from Extend(Also YC)?
ritvikpandey21•1h ago
we're more focused on the core extraction layer itself rather than workflow tooling. we train our own vision models for layout detection, ocr, and table parsing from scratch. the key thing for us is determinism and auditability, so outputs are reproducible run over run, which matters a lot for regulated enterprises.
aryan1silver•2h ago
looks really cool, congrats on the launch! are you guys using something similar to docling[https://github.com/docling-project/docling]?
rtaylorgarlock•1h ago
Has docling improved? I had a bit of a nightmare integrating a docling pipeline earlier this year. Docs said it was VLM-ready, which I spent lots of hours finding out was not true, just to find a relevant github issue which would've saved me a ton of hours :/ allegedly fixed, but wow that burned me bigtime.
ritvikpandey21•1h ago
our team has tested docling pretty extensively, works well for simpler text-heavy docs without complex layouts, but the moment you introduce tables or multi-column stuff it doesn't maintain layout well.
throw03172019•2h ago
Congrats on launch! We have been using this for a new feature we are building in our SaaS app. It’s results were better than Datalab from our tests, especially in the handwriting category.
ritvikpandey21•1h ago
thanks! appreciate the kind words
vikp•1h ago
Hi, I'm a founder of Datalab. I'm not trying to take away from the launch (congrats), just wanted to respond to the specific feedback.

I'm glad you found a solution that worked for you, but this is pretty surprising to hear - our new model, chandra, saturates handwriting-heavy benchmarks like this one - https://www.datalab.to/blog/saturating-the-olmocr-benchmark ,and our production models are more performant than OSS.

Did you test some time ago? We've made a bunch of updates in the last couple of months. Happy to issue some credits if you ever want to try again - vik@datalab.to.

throw03172019•47m ago
Thanks, Vik. Happy to try the model again. Is BAA available?
sidmanchkanti21•1h ago
Thanks for testing! Glad the results work well for you
mikert89•2h ago
AI models will do all this natively
ritvikpandey21•1h ago
we disagree! we've found llms by themselves aren't enough and suffer from pretty big failure modes like hallucination and inferring text rather than pure transcription. we wrote a blog about this [1]. the right approach so far seems to be a hybrid workflow that uses very specific parts of the language model architecture.

[1] https://www.runpulse.com/blog/why-llms-suck-at-ocr

mritchie712•1h ago
> Why LLMs Suck at OCR

I paste screenshots into claude code everyday and it's incredible. As in, I can't believe how good it is. I send a screenshot of console logs, a UI and some HTML elements and it just "gets it".

So saying they "Suck" makes me not take your opinion seriously.

mikert89•1h ago
they need to convince customers its what they need
ritvikpandey21•48m ago
yeah models are definitely improving, but we've found even the latest ones still hallucinate and infer text rather than doing pure transcription. we carry out very rigorous benchmarks against all of the frontier models. we think the differentiation is in accuracy on truly messy docs (nested tables, degraded scans, handwriting) and being able to deploy on-prem/vpc for regulated industries.
mikert89•1h ago
one or two more model releases, and raw documents passed to claude will beat whatever prompt voodoo you guys are cooking
holler•39m ago
Having worked in the space I have real doubts about that. Right now Claude and other top models already do a decent job at e.g. "generate OCR from this document". But as mentioned there are serious failure modes, it's non-deterministic, and especially cost-prohibitive at scale.
serjester•23m ago
This is a hand wavy article that dismisses away VLMs without acknowledging the real world performance everyone is seeing. I think it’d be far more useful if you published an eval.
throw03172019•52m ago
This is like saying AI models can generate images. But a hyper focused model or platform on image generation will do better (for now)
canadiantim•1h ago
Can you increase correctness by giving examples to the model? And key terms or nouns expected?
lajr•1h ago
Hey, congratulations on the launch. Just noticed a discrepancy in the financial 10K example:

There is a section near the start where there are 4 options: Large accelerated filer, Non-accelerated filer, Accelerated filer, or Smaller reporting company.

In this option, "Large accelerated filer" is checked on the PDF, but "Non-accelerated filer" is checked on the Markdown.

ritvikpandey21•51m ago
thanks for the flag! have pointed this out will be pushing an update here shortly
think4coffee•1h ago
Congrats on the launch! You mention that you're SOTA on benchmarks. Can you share your research, or share which benchmark you used?
ritvikpandey21•46m ago
thanks! we benchmark against all the major players (azure doc intelligence, aws textract, google doc ai, frontier llms, etc). we have some public news coming out soon on this front, but we have a very rigorous dataset using both public and synthetic data focusing on the hardest problems in the space (handwriting, tables, etc).
scottydelta•1h ago
AI models will eventually do this natively. This is one of the ways for models to continue to get better, by doing better OCR and by doing better context extraction.

I am already seeing this trend in the recent releases of the native models (such as Opus 4.5, Gemini 3, and especially Gemini 3 flash).

It's only going to get better from here.

Another thing to note is, there are over 5 startups right now in YC portfolio doing the same thing and going after a similar/overlapping target market if I remember correctly.

ritvikpandey21•49m ago
yeah models are definitely improving, but we've found even the latest ones still hallucinate and infer text rather than doing pure transcription. we carry out very rigorous benchmarks against all of the frontier models. we think the differentiation is in accuracy on truly messy docs (nested tables, degraded scans, handwriting) and being able to deploy on-prem/vpc for regulated industries.
dang•35m ago
> happy to run additional documents if people want to share examples

I've got one! The pdf of this out-of-print book is terrible: https://archive.org/details/oneononeconversa0000simo. The text is unreadably faint, and the underlying text layer is full of errors, so copy-paste is almost useless. Can your software extract usable text?

(I'll email you a copy of the pdf for convenience since the internet archive's copy is behind their notorious lending wall)

Beginning January 2026, all ACM publications will be made open access

https://dl.acm.org/openaccess
712•Kerrick•3h ago•76 comments

GPT-5.2-Codex

https://openai.com/index/introducing-gpt-5-2-codex/
96•meetpateltech•55m ago•68 comments

Agent Skills is now an open standard

https://claude.com/blog/organization-skills-and-directory
112•adocomplete•2h ago•76 comments

Classical statues were not painted horribly

https://worksinprogress.co/issue/were-classical-statues-painted-horribly/
401•bensouthwood•6h ago•200 comments

Military Standard on Software Control Levels

https://entropicthoughts.com/mil-std-882e-software-control
32•ibobev•1h ago•10 comments

Your job is to deliver code you have proven to work

https://simonwillison.net/2025/Dec/18/code-proven-to-work/
401•simonw•4h ago•326 comments

Launch HN: Pulse (YC S24) – Production-grade unstructured document extraction

29•sidmanchkanti21•3h ago•29 comments

Virtualizing Nvidia HGX B200 GPUs with Open Source

https://www.ubicloud.com/blog/virtualizing-nvidia-hgx-b200-gpus-with-open-source
82•ben_s•5h ago•21 comments

Are Apple gift cards safe to redeem?

https://daringfireball.net/linked/2025/12/17/are-apple-gift-cards-safe-to-redeem
387•tosh•4h ago•302 comments

Jonathan Blow has spent the past decade designing 1,400 puzzles

https://arstechnica.com/gaming/2025/12/jonathan-blow-has-spent-the-past-decade-designing-1400-puz...
273•furcyd•6d ago•368 comments

Dogalog: A realtime Prolog-based livecoding music environment

https://github.com/danja/dogalog
44•triska•4d ago•10 comments

Please Just Try Htmx

http://pleasejusttryhtmx.com/
277•iNic•4h ago•251 comments

Apples, Trees, and Quasimodes

https://systemstack.dev/2025/09/humane-computing/
12•entaloneralie•3d ago•1 comments

RCE via ND6 Router Advertisements in FreeBSD

https://www.freebsd.org/security/advisories/FreeBSD-SA-25:12.rtsold.asc
117•weeha•10h ago•62 comments

Creating apps like Signal could be 'hostile activity' claims UK watchdog

https://www.techradar.com/vpn/vpn-privacy-security/creating-apps-like-signal-or-whatsapp-could-be...
370•donohoe•7h ago•229 comments

Slowness is a virtue

https://blog.jakobschwichtenberg.com/p/slowness-is-a-virtue
207•jakobgreenfeld•8h ago•71 comments

I got hacked: My Hetzner server started mining Monero

https://blog.jakesaunders.dev/my-server-started-mining-monero-this-morning/
557•jakelsaunders94•21h ago•337 comments

Hightouch (YC S19) Is Hiring

https://hightouch.com/careers
1•joshwget•7h ago

Firefox will have an option to disable all AI features

https://mastodon.social/@firefoxwebdevs/115740500373677782
49•twapi•51m ago•40 comments

Statistical Learning Theory and ChatGPT

https://kamalikachaudhuri.substack.com/p/statistical-learning-theory-and-chat
3•jxmorris12•2d ago•0 comments

Egyptian Hieroglyphs: Lesson 1

https://www.egyptianhieroglyphs.net/egyptian-hieroglyphs/lesson-1/
134•jameslk•13h ago•53 comments

From profiling to kernel patch: the journey to an eBPF performance fix

https://rovarma.com/articles/from-profiling-to-kernel-patch-the-journey-to-an-ebpf-performance-fix/
35•todsacerdoti•4d ago•1 comments

Microscopic robots that sense, think, act, and compute

https://www.science.org/doi/10.1126/scirobotics.adu8009
26•XzetaU8•4d ago•2 comments

Show HN: Paper2Any – Open tool to generate editable PPTs from research papers

https://github.com/OpenDCAI/DataFlow-Agent
5•Mey0320•2h ago•0 comments

What is an elliptic curve? (2019)

https://www.johndcook.com/blog/2019/02/21/what-is-an-elliptic-curve/
125•tzury•12h ago•13 comments

Using TypeScript to Obtain One of the Rarest License Plates

https://www.jack.bio/blog/licenseplate
108•lafond•4h ago•99 comments

After ruining a treasured water resource, Iran is drying up

https://e360.yale.edu/features/iran-water-drought-dams-qanats
301•YaleE360•8h ago•247 comments

Heart and Kidney Diseases and Type 2 Diabetes May Be One Ailment

https://www.scientificamerican.com/article/heart-and-kidney-diseases-plus-type-2-diabetes-may-be-...
48•Brajeshwar•3h ago•25 comments

AI helps ship faster but it produces 1.7× more bugs

https://www.coderabbit.ai/blog/state-of-ai-vs-human-code-generation-report
162•birdculture•6h ago•139 comments

The Big City; Save the Flophouses (1996)

https://www.nytimes.com/1996/01/14/magazine/the-big-city-save-the-flophouses.html
42•ChadNauseam•3d ago•29 comments