frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Leak reveals Grok might soon edit your spreadsheets

https://techcrunch.com/2025/06/23/leak-reveals-grok-might-soon-edit-your-spreadsheets/
1•mfiguiere•1m ago•0 comments

Ford Will Keep Battery Factory Even If Republicans Ax Tax Break

https://www.nytimes.com/2025/06/23/business/ford-battery-factory-electric-vehicles.html
2•doener•1m ago•0 comments

Teaching an Emulator How to Talk

https://mrcat.au/blog/how_to_devices/
1•gsky•1m ago•0 comments

Beyond the editor: Bringing AI to the rest of your dev workflow

https://trunk.io/blog/beyond-the-editor-bringing-ai-to-the-rest-of-your-dev-workflow
2•draward•4m ago•0 comments

2025 Iberia Blackout Report

https://media.licdn.com/dms/document/media/v2/D4D1FAQGcyyYYrelkNg/feedshare-document-pdf-analyzed/B4DZeBtlohGsAk-/0/1750227910090?e=1750896000&v=beta&t=uEftse3BPsTjdLQ3DmjoVkadhUGqf7-MfYj_6UnSS28
1•leymed•4m ago•0 comments

Howdy – Windows Hello style facial authentication for Linux

https://github.com/boltgolt/howdy
1•LorenDB•4m ago•0 comments

Waiting Is Risky

https://www.bryanbraun.com/2025/06/21/waiting-is-risky/
1•LorenDB•5m ago•0 comments

Ionos Submits Expression of Interest for AI Gigafactory

https://www.ionos-group.com/investor-relations/publications/announcements/ionos-submits-expression-of-interest-for-ai-gigafactory.html
1•doener•6m ago•0 comments

Is growth operating still a good business model?

1•fabiansolu•6m ago•0 comments

X blocked a paid user for no reason for "5-7 days" or forever

https://substack.com/home/post/p-166662797
5•antonkar•9m ago•1 comments

Writing shaders in TypeScript, the solution to a major WebGPU limitation?

https://bsky.app/profile/iwoplaza.bsky.social/post/3lrsuzeq5zk2x
1•iwoplaza•10m ago•1 comments

Software is evolving backwards [video]

https://www.youtube.com/watch?v=oXtvAQ-e0iE
2•glth•10m ago•0 comments

The Future Isn't Horizontal: AI's Vertical Revolution

https://knowledge.insead.edu/strategy/future-isnt-horizontal-ais-vertical-revolution
1•fittingopposite•11m ago•0 comments

Stow: Package Manager When You Can't Use Your Package Manager

https://theartofmachinery.com/2021/08/08/stow_as_package_manager.html
2•LorenDB•11m ago•0 comments

I Rebuilt DevinAI's DeepWiki

https://www.deepgraph.co/trending
2•aracena•12m ago•0 comments

Ask HN: How does AI overcome the "essential complexity" as in No Silver Bullet?

1•hintymad•13m ago•0 comments

Into the Unwritten Dawn

https://dayafter.substack.com/p/into-the-unwritten-dawn
1•shmval•14m ago•0 comments

Microsoft adds Steam games to its Xbox PC app on Windows

https://www.theverge.com/news/690967/microsoft-xbox-app-windows-steam-games-aggregated-library-support-beta
2•DocFeind•14m ago•0 comments

Show HN: Nodehaus – Custom AI Models Without the Technical Overhead

https://nodehaus.io
1•neutronsoup•19m ago•0 comments

Apple Research unearthed forgotten AI technique and using it to generate images

https://9to5mac.com/2025/06/23/apple-ai-image-model-research-tarflow-starflow/
2•celias•19m ago•1 comments

Judge denies creating "mass surveillance program" harming all ChatGPT users

https://arstechnica.com/tech-policy/2025/06/judge-rejects-claim-that-forcing-openai-to-keep-chatgpt-logs-is-mass-surveillance/
4•merksittich•21m ago•1 comments

True Costs of Misinformation – The Global Spread of Misinformation Laws

https://ijoc.org/index.php/ijoc/article/view/21937
2•gnabgib•22m ago•0 comments

Microsoft Sets New 60-Day Limit for System Restore Points in Windows 11 Update

https://www.extremetech.com/computing/microsoft-sets-new-60-day-limit-for-system-restore-points-in-windows-11
2•burnt-resistor•22m ago•0 comments

Sparc3d: High-Resolution 3D Model Generation

https://sparc3d.org/
2•gregzeng95•23m ago•0 comments

Brazil and China megarailway raises deforestation warnings in the Amazon

https://news.mongabay.com/2025/06/brazil-china-megarailway-raises-deforestation-warnings-in-the-amazon/
3•PaulHoule•25m ago•0 comments

Elon Musk's Lawyers Claim He 'Does Not Use a Computer'

https://www.wired.com/story/elon-musk-computer-sam-altman/
8•thm•26m ago•1 comments

Why do animals have such different lifespans? [video]

https://www.youtube.com/watch?v=7m8QlSPP7t0
2•gmays•27m ago•0 comments

Nocative: Creators to live with joy Please create an account to view the juice

https://nocative.com
1•penpendian•28m ago•1 comments

Ingrid: Cross-platform crossword puzzle construction app

https://ingrid.cx/
1•celaleddin•28m ago•0 comments

Calculating the Fibonacci numbers on GPU – simons blog

https://veitner.bearblog.dev/calculating-the-fibonacci-numbers-on-gpu/
1•rbanffy•28m ago•0 comments
Open in hackernews

Launch HN: Reducto Studio (YC W24) – Build accurate document pipelines, fast

28•adit_a•3h ago
Hi HN! We’re Adit and Raunak, co-founders of Reducto (YC W24, https://reducto.ai). Reducto turns unstructured documents (e.g., PDFs, scans, spreadsheets) into structured data. This data can then be used for retrieval, passed into LLMs, or used elsewhere downstream.

We started Reducto when we realized that so many of today’s AI applications require good quality data. Everyone knows that good inputs lead to better outputs, but 80% of the world’s data is still trapped inside of things like messy PDFs and spreadsheets. Raunak and I launched a really early MVP of parsing and extracting from unstructured documents, and were lucky to have a lot of interest from technical teams when they realized that the accuracy was something they hadn’t seen before.

We started by just releasing an API for engineers to build with, but over time we realized that an accurate API was only part of the puzzle. Our customers wanted to be able to easily set up multi step pipelines, evaluate and iterate on performance within their use case, and work with non-engineering teammates that were also involved in the real world document processing flow.

That’s why we’re launching Reducto Studio, a web platform that sits on top of our APIs for users to build and iterate on end-to-end document pipelines.

With Studio, you can:

- Drop an entire file set and get per-field and per-document accuracy scores against your eval data.

- Auto-generate and continuously optimize extraction schemas to hit production-grade quality fast.

- Save every run, iterate on parse/extract configs, and compare results side-by-side.

You can see some examples here (https://studio.reducto.ai) or you can watch this walkthrough: https://www.loom.com/share/b243551741c642c6a594c00353fcecb3.

If you’d like to upload your own document you can log in and do so as well - we don’t make you book a demo or put a payment down to try it.

Thanks for reading and checking it out! This is only the first step for Studio, so we’d love feedback on anything: UX rough edges (we know they’re there!), features that would make evaluations better for you, hard documents you’ve had trouble with, or anything else about wrangling with unstructured data.

Comments

omaerkhan•2h ago
FYI - https://links.reducto.ai/studio doesn't seem to be working... ERR_TOO_MANY_REDIRECTS
adit_a•2h ago
Fixed! Sorry about that
TimMeade•2h ago
Still not working here
adit_a•1h ago
The direct loom link isn't working for you? Are you seeing the same redirects error?
weego•2h ago
I'm not a product fit, but I would like to take a moment to praise the detailed beauty of the design work on the site.

From the typography and layout to the line-work down to how the gradients in the, in fashion, large logotype at the bottom of the footer are tied in by using texture.

Was it in house, or an agency? I'd love to see some more of whoever's work it was

adit_a•2h ago
Thank you! We worked with Airfoil for the website :)
esafak•2h ago
https://www.airfoil.studio/ presumably
raunakchowdhuri•1h ago
yep!
iyn•1h ago
Agreed — came here to say exactly that. I like that this is not yet another tailwind template (nothing wrong with them, I use them all the time) but something with its own identity. I especially love the illustrations/icons. Well done!
skadamat•2h ago
Congrats on the launch! How do you guys compare with Datalab with regards to accuracy?

https://www.datalab.to/

gbertb•1h ago
I want to know this, too. Lots of these companies are doing the same thing, but leave out benchmarks that include marker
jackienotchan•1h ago
I saw your recent $24M series A and was kind of surprised to only see you launching now, congrats!

YC seems to fund quite many document extraction companies, even within the same batch:

- Pulse (YC W24): https://www.ycombinator.com/companies/pulse-3

- OmniAI (YC W24): https://www.ycombinator.com/companies/omniai

- Extend (YC W23): https://www.ycombinator.com/companies/extend

How do you differentiate from these? And how do you see the space evolving as LLMs commoditize PDF extraction?

echelon•1h ago
How do you raise Series A before launch / PMF?

I assume y'all launched before this to select partners? Or perhaps this is a new product on top of the core product?

Congrats! Keep at it!

adit_a•23m ago
Thank you!

To clarify, our API was already fully launched and in prod with customers when we raised our series A. This launch is specifically for the platform we're building around the API :)

adit_a•24m ago
Thanks! To clarify, we launched our document processing APIs a while ago. This launch is specifically for a new platform we're building around our API based on all of the things our customers previously had to build internally to support their use of Reducto (eval tools, monitoring etc).

Generally speaking, my view on the space is that this was crowded well before LLMs. We've met a lot of the folks that worked on things like drivers for printers to print PDFs in the 1990s, IDP players from the last few decades, and more recent cloud offerings.

The context today is clearly very different than it was in the IDP era though (human process with semi-structured content -> LLMs are going to reason over most human data), and so is the solution space (VLMs are an incredible new tool to help address the problem).

Given that I don't think it's surprising that companies inside and outside of YC have pivoted into offering document processing APIs over the past year. Generally speaking we don't see differentiation in the sense of just feature set since that'll converge over time, and instead primarily focus on accuracy, reliability, and scalability, all 3 of which have a very substantive impact from last mile improvements. I think the best testament I have to that is that the customers we've onboarded are very technical, and as a result are very thorough when choosing the right solution for them. That includes a company wide roll out at one of the 4 biggest tech companies, one of the 3 biggest trading firms, and a big set of AI product teams like Harvey, Rogo, ScaleAI etc.

At the end of the day I don't see VLM improvements as antagonistic to what we're doing. We already use them a lot for things like an agentic OCR (correcting mistakes from our traditional CV pipeline). On some level our customers aren't just choosing us for PDF->markdown, they're onboarding with us because they want to spend more of their time on the things that are downstream from having accurate data, and I expect that there'll be room for us to make that even more true as models improve.

kbyatnal•23m ago
Founder of Extend (https://www.extend.ai/) here, it's a great question and thanks for the tag. There definitely are a lot of document processing companies, but it's a large market and more competition is always better for users.

In this case, the Reducto team seems to have cloned us down to the small details [1][2], which is a bit disappointing to see. But imitation is the best form of flattery I suppose! We thought deeply about how to build an ergonomic configuration experience for recursive type definitions (which is deceptively complex), and concluded that a recursive spreadsheet-like experience would be the best form factor (which we shipped over a year ago).

> "How do you see the space evolving as LLMs commoditize PDF extraction?"

Having worked with a ton of startups & F500s, we've seen that there's still a large gap for businesses in going from raw OCR outputs —> document pipelines deployed in prod for mission-critical use cases. LLMs and VLMs aren't magic, and anyone who goes in expecting 100% automation is in for a surprise.

The prompt engineering / schema definition is only the start. You still need to build and label datasets, orchestrate pipelines (classify -> split -> extract), detect uncertainty and correct with human-in-the-loop, fine-tune, and a lot more. You can certainly get close to full automation over time, but it takes time and effort — and that's where we come in. Our goal is to give AI teams all of that tooling on day 1, so they hit accuracy quickly and focus on the complex downstream post-processing of that data.

[1] https://dub.sh/ojv9b7p

[2] https://dub.sh/X7GFlDd

wilson090•11m ago
I've used instabase before which has had the same UX for years. What about benchmarks between the two on extraction performance?
bze12•1h ago
Nice! I was already considering using reducto api. Will give this a try
adit_a•1h ago
Let us know if you have any feedback!
serjester•53m ago
Congrats on the launch guys, mobile website seems to be broken though.
adit_a•23m ago
Thank you! What's the error you're seeing on mobile?
willwjack•13m ago
This would have saved me so much pain back when I was working on RAG workflows. Great to see.
Fraaaank•5m ago
Why do you only get a data processing agreement when on the enterprise plan? It's a legal requirement for any European company.
b0a04gl•2m ago
if reducto leans in fully as the layer that remembers every correction, every edge case, every shift in layout or wording across document versions it starts becoming more than a pipeline. it becomes institutional memory for unstructured data. none of the other players really do that. they extract, maybe evaluate once, then forget.

but the real pain is always in the second and third batch. when formats change subtly. if reducto becomes the system that adapts without you babysitting it, that's where it may win. continuity's the moat imo among the competitors