frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Ask HN: What are the word games do you play everyday?

1•gogo61•1m ago•0 comments

Show HN: Paper Arena – A social trading feed where only AI agents can post

https://paperinvest.io/arena
1•andrenorman•2m ago•0 comments

TOSTracker – The AI Training Asymmetry

https://tostracker.app/analysis/ai-training
1•tldrthelaw•6m ago•0 comments

The Devil Inside GitHub

https://blog.melashri.net/micro/github-devil/
2•elashri•7m ago•0 comments

Show HN: Distill – Migrate LLM agents from expensive to cheap models

https://github.com/ricardomoratomateos/distill
1•ricardomorato•7m ago•0 comments

Show HN: Sigma Runtime – Maintaining 100% Fact Integrity over 120 LLM Cycles

https://github.com/sigmastratum/documentation/tree/main/sigma-runtime/SR-053
1•teugent•7m ago•0 comments

Make a local open-source AI chatbot with access to Fedora documentation

https://fedoramagazine.org/how-to-make-a-local-open-source-ai-chatbot-who-has-access-to-fedora-do...
1•jadedtuna•8m ago•0 comments

Introduce the Vouch/Denouncement Contribution Model by Mitchellh

https://github.com/ghostty-org/ghostty/pull/10559
1•samtrack2019•9m ago•0 comments

Software Factories and the Agentic Moment

https://factory.strongdm.ai/
1•mellosouls•9m ago•1 comments

The Neuroscience Behind Nutrition for Developers and Founders

https://comuniq.xyz/post?t=797
1•01-_-•9m ago•0 comments

Bang bang he murdered math {the musical } (2024)

https://taylor.town/bang-bang
1•surprisetalk•9m ago•0 comments

A Night Without the Nerds – Claude Opus 4.6, Field-Tested

https://konfuzio.com/en/a-night-without-the-nerds-claude-opus-4-6-in-the-field-test/
1•konfuzio•12m ago•0 comments

Could ionospheric disturbances influence earthquakes?

https://www.kyoto-u.ac.jp/en/research-news/2026-02-06-0
2•geox•13m ago•1 comments

SpaceX's next astronaut launch for NASA is officially on for Feb. 11 as FAA clea

https://www.space.com/space-exploration/launches-spacecraft/spacexs-next-astronaut-launch-for-nas...
1•bookmtn•14m ago•0 comments

Show HN: One-click AI employee with its own cloud desktop

https://cloudbot-ai.com
2•fainir•17m ago•0 comments

Show HN: Poddley – Search podcasts by who's speaking

https://poddley.com
1•onesandofgrain•18m ago•0 comments

Same Surface, Different Weight

https://www.robpanico.com/articles/display/?entry_short=same-surface-different-weight
1•retrocog•20m ago•0 comments

The Rise of Spec Driven Development

https://www.dbreunig.com/2026/02/06/the-rise-of-spec-driven-development.html
2•Brajeshwar•24m ago•0 comments

The first good Raspberry Pi Laptop

https://www.jeffgeerling.com/blog/2026/the-first-good-raspberry-pi-laptop/
3•Brajeshwar•24m ago•0 comments

Seas to Rise Around the World – But Not in Greenland

https://e360.yale.edu/digest/greenland-sea-levels-fall
2•Brajeshwar•24m ago•0 comments

Will Future Generations Think We're Gross?

https://chillphysicsenjoyer.substack.com/p/will-future-generations-think-were
1•crescit_eundo•28m ago•1 comments

State Department will delete Xitter posts from before Trump returned to office

https://www.npr.org/2026/02/07/nx-s1-5704785/state-department-trump-posts-x
2•righthand•31m ago•1 comments

Show HN: Verifiable server roundtrip demo for a decision interruption system

https://github.com/veeduzyl-hue/decision-assistant-roundtrip-demo
1•veeduzyl•32m ago•0 comments

Impl Rust – Avro IDL Tool in Rust via Antlr

https://www.youtube.com/watch?v=vmKvw73V394
1•todsacerdoti•32m ago•0 comments

Stories from 25 Years of Software Development

https://susam.net/twenty-five-years-of-computing.html
3•vinhnx•33m ago•0 comments

minikeyvalue

https://github.com/commaai/minikeyvalue/tree/prod
3•tosh•37m ago•0 comments

Neomacs: GPU-accelerated Emacs with inline video, WebKit, and terminal via wgpu

https://github.com/eval-exec/neomacs
1•evalexec•42m ago•0 comments

Show HN: Moli P2P – An ephemeral, serverless image gallery (Rust and WebRTC)

https://moli-green.is/
2•ShinyaKoyano•46m ago•1 comments

How I grow my X presence?

https://www.reddit.com/r/GrowthHacking/s/UEc8pAl61b
2•m00dy•48m ago•0 comments

What's the cost of the most expensive Super Bowl ad slot?

https://ballparkguess.com/?id=5b98b1d3-5887-47b9-8a92-43be2ced674b
1•bkls•48m ago•0 comments
Open in hackernews

Launch HN: Extend (YC W23) – Turn your messiest documents into data

https://www.extend.ai/
61•kbyatnal•4mo ago
Hey HN! We’re Kushal and Eli, co-founders of Extend (https://www.extend.ai/). Extend is a toolkit for AI teams to ingest any kind of messy document (e.g. PDFs, images, excel files) and build incredible products.

We built Extend to handle the hardest documents that break most pipelines. You can see some examples here in our demo (no signup required): https://dashboard.extend.ai/demo

I know you're probably thinking “not another document API startup”. Unfortunately, the problem just isn’t solved yet!

I’ve personally spent months struggling to build reliable document pipelines at a previous job. The long tail of edge cases is endless — massive tables split across pages, 100pg+ files, messy handwriting, scribbled signatures, checkboxes represented in 10 different formats, multiple file types… the list just keeps going. After seeing countless other teams during our time in YC run into these same issues, we started building Extend.

We initially launched with a set of APIs for engineers to parse, classify, split, and extract documents. That started to take off, and soon we were deployed in production at companies building everything from medical agents, to real-time bank account onboarding, to mortgage automation. Over time, we’ve worked closely with these teams and seen first-hand how large the gap is between raw OCR/model outputs —> a production-ready pipeline (LLMs and VLMs aren’t magic).

Unlike other solutions in the space, we're specifically focused on three core areas: (1) the computer vision layer, (2) LLM context engineering, and (3) the surrounding product tooling. The combination of all three is what we think it takes to hit 99% accuracy and maintain it at scale.

For instance, to parse messy handwriting, we built an agentic OCR correction layer which uses a VLM to review and make edits to low confidence OCR errors. To tackle multi-page tabular data, we built a semantic chunking engine which can detect the optimal boundaries within a document so models can excel with smaller context inputs.

We also shipped a prompt optimization agent to automate the endless prompt engineering whack-a-mole teams spend time on. It’s built as a background agent to replicate the best prompter on your team, and runs in a loop with access to a set of tools (view files, run evals, analyze results, and update schemas).

The most surprising part of this whole experience has been seeing how many crazy PDF formats are out there! We've run into everything from supermarket inventory magazines, pesticide labels, construction blueprints, and satellite manufacturing plans.

Everything described above is live today. You can see it in action here (no signup): https://dashboard.extend.ai/demo. To upload your own files, you can log in and do so (we’re adding free usage credits to all accounts that sign up today).

We’re excited to be sharing with HN! We’d love to hear about your experiences building document pipelines. Please try it out, and share any and all feedback with us (e.g. hard documents that didn’t work, feature requests).

Comments

FabioFleitas•4mo ago
We've been using Extend for over a year and have been super happy with the product and accuracy of the data extraction
kbyatnal•4mo ago
thank you Fabio!
nextworddev•4mo ago
Just how many IDP / document processing “AI” startups are out there?
kbyatnal•4mo ago
There's definitely no shortage of options. OCR has been around for decades at this point, and legacy IDP solutions really proliferated in the last ~10 years.

The world today is quite different though. In the last 24 months, the "TAM" for document processing has expanded by multiple orders of magnitude. In the next 10 years, trillions of pages of documents will be ingested across all verticals.

Previous generations of tools were always limited to the same set of structured/semi-structured documents (e.g. tax forms). Today, engineering teams are ingesting truly the wild west of documents, from 500pg mortgage packages to extremely messy healthcare forms. All of those legacy providers fall apart when tackling these types of actual unstructured docs.

We work with hundreds of customers now, and I'd estimate 90% of the use cases we tackle weren't technically solvable until ~12 months ago. So it's nearly all greenfield work, and very rarely replacing an existing vendor or solution already in place.

All that to say, the market is absolutely huge. I do suspect we'll see a plateau in new entrants though (and probably some consolidation of current ones). With how fast the AI space moves, it's nearly impossible to compete if you enter a market just a few months too late.

nextworddev•4mo ago
fully aware that OcR and IDP has been around, but the “AI native” versions are pretty saturated too
kbyatnal•4mo ago
There's certainly a lot of tools that focus on individual parts of the problem (e.g. the OCR layer, or workflows on top). But very few that solve the problem end-to-end with enough flexibility for AI teams that want a lot of control over the experience.

For example, we expose options for AI teams to control how chunking works, whether to enable a bounding box citation model, and whether a VLM should correct handwriting errors.

Most customers we speak with, the evaluation is actually between Extend or building it in-house (and we have a pretty good win rate here).

nextworddev•4mo ago
Not sure about that. There's Llamaindex and plus many other document orchestration frameworks
airstrike•4mo ago
Congrats on the launch! It looks really cool.

> Unlike other solutions in the space, we're specifically focused on three core areas: (1) the computer vision layer, (2) LLM context engineering, and (3) the surrounding product tooling.

I assume the goal is to continue to serve this via an API? That would be immensely helpful to teams building other products around these capabilities.

kbyatnal•4mo ago
thanks! Yup that's correct, we offer a set of APIs for handling documents: parsing, classification, splitting, and extraction.

We've seen customers integrate these in a few interesting ways so far:

1. Agents (exposing these APIs as tools in certain cases, or into a vector DB for RAG)

2. Real-time experiences in their product (e.g. we power all of Brex's user-facing document upload flows)

3. Embedded in internal tooling for back-office automation

Our customers are already requesting new APIs and capabilities for all the other problems they run into with documents (e.g. fintech customers want fraud detection, healthcare users need form filling). Some of these we'll be rolling out soon!

wunderlust•4mo ago
For some reason "turn your messiest data into documents" makes more sense.
airstrike•4mo ago
Seconded. It's unstructured data that becomes structured.
xpe•4mo ago
To what degree does this product (or others) look at a collection of documents and offer various possible schema to choose from? This seems like not just a "hard for AI" problem but a "hard for humans" problem. In other words, even a high-quality AI with lots of "thinking" time isn't enough. It isn't just about reasoning through a problem -- there will be a lot of judgment calls. Judgment calls that require significant context and domain knowledge. This to me seems like an area where human-in-the-loop really matters.
nibab•4mo ago
at ng3n.ai ive been using datalab.to for document processing. currently its mostly for conversion to markdown and some extraction.

ng3n is more of a grid-like workflow solution on top of documents. it's a user-facing application geared towards non-technical users that have processing needs.

if there are all these new problems that became solvable, what exactly are they?

id be interested in replacing datalab with extend, but im not sure what avenues that opens for ng3n. would be very curious to learn!

kbyatnal•4mo ago
thanks! Datalab is great, I've met Vik a few times and their team has done some impressive work. We can also support the conversion to markdown use case, and might be a better fit depending on your use case. Feel free to create an account to try it out!
FitchApps•4mo ago
Very cool. Are there any checks for accuracy / data verification? How accurate is your solution when it comes to messy table parsing or handwriting.
kbyatnal•4mo ago
thanks!

A lot of customers choose us for our handwriting, checkbox, and table performance. To handle complex handwriting, we've built an agentic OCR correction layer which uses a VLM to review and make edits to low confidence OCR errors.

Tables are a tricky beast, and the long tail of edge cases here is immense. A few things we've found to be really impactful are (1) semantic chunking that detects table boundaries (so a table that spans multiple pages doesn't get chopped in half) and (2) table-to-HTML conversion (in addition to markdown). Markdown is great at representing most simple tables, but can't represent cases where you have e.g. nested cells.

You can see examples of both in our demo! https://dashboard.extend.ai/demo

Accuracy and data verification is challenging. We have a set of internal benchmarks we use, which gets us pretty far, but that's not always representative of specific customer situations. That's why one of the earliest things we built was a evaluation product, so that customers can easily measure performance on their exact docs and use cases. We recently added support for LLM-as-a-judge and semantic similarity checks, which have been really impactful for measuring accuracy before going live.

aaa29292•4mo ago
on the pricing page, what in the world is performance optimized vs cost optimized???

https://docs.extend.ai/2025-04-21/product/general/how-credit...

Are those just different SLAs or different APIs or what?

aaa29292•4mo ago
How different are the extraction qualities, any benchmarks or other info you can share?
kbyatnal•4mo ago
It's very dependent on the use case. That's why we offer a native evals experience in the product, so you can directly measure the % accuracy diffs between the two modes for your exact docs.

As a rule of thumb, light processing mode is great for (1) most classification tasks, (2) splitting on smaller docs, (3) extraction on simpler documents, or (4) latency sensitive use cases.

serjester•4mo ago
This is the most confusing pricing page I’ve ever seen - different options have different credit usage and different cost per credits? How many degrees of freedom do you real need to represent API cost.
cle•4mo ago
> How many degrees of freedom do you real need to represent API cost.

The amount that your users care about.

At a large enough scale, users will care about the cost differences between extraction and classification (very different!) and finding the right spot on the accuracy-latency curve for their use case.

kbyatnal•4mo ago
Exactly correct! We've had users migrate over from other providers because our granular pricing enabled new use cases that weren't feasible to do before.

One interesting thing we've learned is, most production pipelines often end up using a combination of the two (e.g. cheap classification and splitting, paired with performance extraction).

kbyatnal•4mo ago
Feedback heard. Pricing is hard, and we've iterated on this multiple times so far.

Our goal is to provide customers with as much transparency & flexibility as possible. Our pricing has 2 axes:

- the complexity of the task

- performance processing vs cost-optimized processing

Complexity matters because e.g. classification is much easier than extraction, and as such it should be cheaper. That unlocks a wide range of use cases, such as tagging and filtering pipelines.

Toggles for performance is also important because not all use cases are created equal. Similar to how having options between cheaper and the best foundation models is important, the same applies to document tasks.

For certain use cases, you might be willing to take a slight hit to accuracy in exchange for better costs and latency. To support this, we offer a "light" processing mode (with significantly lower prices) that uses smaller models, fewer VLMs, and more heuristics under the hood.

For other use cases, you simply want the highest accuracy possible. Our "performance" processing mode is a great fit for that, which enables layout models, signature detection, handwriting VLMs, and the most performant foundation models.

In fact, most pipelines we seen in production often end up combining the two (cheap classification and splitting, paired with performance extraction).

Without this level of granularity, we'd either be overcharging certain customers or undercharging others. I definitely understand how this is confusing though, we'll work on making our docs better!

kbyatnal•4mo ago
good question!

Our goal is to provide customers with as much flexibility as possible. For certain use cases, you might be willing to take a slight hit to accuracy in exchange for better costs and latency. To support this, we offer a "light" processing mode (with significantly lower prices) that uses smaller models, fewer VLMs, and more heuristics under the hood.

For other use cases, you simply want the highest accuracy possible. Our "performance" processing mode is a great fit for that, which enables layout models, signature detection, handwriting VLMs, and the most performant foundation models.

We back this up with a native evals experience in the product, so you can directly measure the % accuracy difference between the two modes for your exact use case.

asdev•4mo ago
Have you ran your pipeline against an open benchmark like https://github.com/opendatalab/OmniDocBench?
pratikshelar871•4mo ago
300+ dollars for starter plan targeted to startups seems to be a missed opportunity. Startups might find it as a high barrier to try. You are solving a good problem but the pricing seems too high.
nextworddev•4mo ago
I highly recommend companies to keep it simple and use n8n with Gemini for OCR. You will save money and get 90%+ of the same functionality as products like this.
constantinum•4mo ago
Other players:

1. Trellis (YC W24) 2. Roe AI (YC W24) 3. Omni AI (YC W24) 4. Reductor (YC W24)

Other players(extended):

1. Unstract: Open-source ETL for documents (https://github.com/Zipstack/unstract) 2. Datalab: Makers of Surya/Marker 3. Unstructured.io

scottydelta•3mo ago
You forgot Nanonets and it even has its own open source model on huggingface.
prats226•3mo ago
Here is link to open source model: https://huggingface.co/nanonets/Nanonets-OCR-s

And hosted model: https://docstrange.nanonets.com/

scottydelta•3mo ago
What service do you use to get notifications about nanonets mentions on HN?
prats226•3mo ago
https://mention.com/en/
arvind_k•3mo ago
At Zipphy, I worked on solving similar problems in on-prem environments — building an OCR + NLP + CV pipeline to generate spatial layouts and classify documents at scale.

One persistent challenge was generalizing across “wild” PDFs, especially multi-page tables.

Your mention of agentic OCR correction and semantic chunking really caught my attention. I’m curious — how did you architect those to stay consistent across diverse layouts without relying on massive rule sets?