frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Free API to extract PDF data

3•leftnode•4h ago
Hi HN,

Like everyone, I'm working on an product that uses LLMs to extract data from photos and documents. Part of the processing pipeline is extracting data from PDFs as raw text or a raster image.

As part of our leadgen strategy, we've opened our REST API that lets you process pages of a PDF. The API is completely free to use anonymously, but is rate limited to 1 page per 30 seconds. Creating a free account removes this restriction.

The two endpoints are:

- https://extract.dev/api/pages/extract/raster - Rasterize a page of a PDF

- https://extract.dev/api/pages/extract/text - Extract text from a page of a PDF

Both have the same request format:

    {
        "file": "https://assets.extract-cdn.com/data/hd-receipt.pdf",
        "page": 1
    }
I've outlined more of the documentation here: https://extract.dev/docs

Under the hood, the API is using Poppler to extract text and rasterize pages. Note that the text extraction functionality extracts actual text encoded in the PDF, and does not employ an OCR model. Give it a spin, I'm interested in your feedback if this is useful or not.

A Plan to Rebuild Gaza Lists Nearly 30 Companies. Many Say They're Not Involved

https://www.wired.com/story/a-plan-to-rebuild-gaza-lists-nearly-30-companies-many-say-theyre-not-...
1•quapster•45s ago•0 comments

Half of America's Voting Machines Are Now Owned by a MAGA Oligarch

https://dissentinbloom.substack.com/p/half-of-americas-voting-machines
2•mdhb•2m ago•0 comments

USA causes of death by age and gender

https://www.worldlifeexpectancy.com/usa-cause-of-death-by-age-and-gender
1•lostmsu•2m ago•0 comments

Virtual education company was a lifeline to a rural district. Now they're at war

https://www.nbcnews.com/news/us-news/new-mexico-school-district-stride-k12-virtual-education-rcna...
1•ceejayoz•3m ago•0 comments

Unpacking Cloudflare Workers CPU Performance Benchmarks

https://blog.cloudflare.com/unpacking-cloudflare-workers-cpu-performance-benchmarks/
1•makepanic•3m ago•0 comments

Reducing Pipeline Bubbles with Adaptive Parallelism on Heterogeneous Models

https://arxiv.org/abs/2509.23722
1•PaulHoule•5m ago•0 comments

Crypto Became a Trump Trade – Paul Krugman

https://paulkrugman.substack.com/p/how-crypto-became-a-trump-trade
1•rbanffy•6m ago•0 comments

The quality of AI code is low and the AIs themselves don't understand it

https://twitter.com/Jonathan_Blow/status/1977480106588410278
1•redbell•7m ago•0 comments

AppLovin Nonconsensual Installs

https://www.benedelman.org/applovin-nonconsensual-installs/
2•jhap•7m ago•0 comments

Going Broke Slowly: The Investment Implications of Still-Rising Federal Debt

https://am.jpmorgan.com/us/en/asset-management/adv/insights/market-insights/market-updates/notes-...
1•nis0s•7m ago•1 comments

The DHH Problem

https://tomstu.art/the-dhh-problem
4•lr0•9m ago•1 comments

My New Project:)

1•toxi360•13m ago•0 comments

Salesforce Says AI Customer Service Saves $100M Annually

https://www.bloomberg.com/news/articles/2025-10-14/salesforce-says-ai-customer-service-saves-100-...
1•sottol•14m ago•0 comments

Rest in Peace Mark Forster

http://markforster.squarespace.com/blog/2025/10/14/rest-in-peace-mark-forster.html
3•dazhur•17m ago•1 comments

Nook Browser – Browse. It's Yours. Open-Source, Private, Forever.

https://browsewithnook.com/
2•nikolay•18m ago•4 comments

Say the Quiet Part Out Loud

https://squirrelsquadron.substack.com/p/say-the-quiet-part-out-loud
1•squirrel•20m ago•0 comments

How Can You Plan a Decade in Advance?

https://squirrelsquadron.substack.com/p/how-on-earth-can-you-plan-a-decade
1•squirrel•21m ago•0 comments

Reducto Raises $108M to Shape the Future of AI Document Intelligence

https://reducto.ai/blog/reducto-series-b-funding
1•constantinum•21m ago•0 comments

Surveillance Empire That Tracked World Leaders, a Vatican Enemy, and Maybe You

https://www.motherjones.com/politics/2025/10/firstwap-altamides-phone-tracking-surveillance-secre...
5•sipofwater•24m ago•0 comments

AI and Labor Markets: What We Know and Don't Know

https://digitaleconomy.stanford.edu/news/ai-and-labor-markets-what-we-know-and-dont-know/
2•cjbarber•25m ago•0 comments

The New World: Joshua Kushner, Thrive Capital, and the American Dream

https://joincolossus.com/article/joshua-kushner-thrive-new-world/
2•lleims•25m ago•0 comments

Show HN: Wispbit – Keep codebase standards alive

https://wispbit.com
7•dearilos•28m ago•1 comments

AI bots wrote and reviewed all papers at this conference

https://www.nature.com/articles/d41586-025-03363-3
2•blendergeek•28m ago•0 comments

Why the open social web matters now

https://werd.io/why-the-open-social-web-matters-now/
4•benwerd•30m ago•0 comments

I built a tiny app that turns your MacBook keyboard into a satisfying typewriter

https://apps.apple.com/us/app/funkey-mechanical-keyboard-app/id6469420677?mt=12
2•arimajain110205•31m ago•0 comments

Got opinions on observability? I could use your help

https://charity.wtf/2025/10/13/got-opinions-on-observability-i-could-use-your-help-once-more-with...
2•mooreds•34m ago•0 comments

The Origins of Efficiency

https://press.stripe.com/origins-of-efficiency
3•mitchbob•35m ago•0 comments

Reducing Screen Time by 86%

https://blog.mattbearman.com/reducing-screen-time/
4•MattBearman•35m ago•0 comments

Building a Datacenter (For Dummies) Part I

https://cruciblecapital.substack.com/p/building-a-datacenter-for-dummies
2•ChrisArchitect•35m ago•0 comments

The disparity between how high and low income earners feel about the economy

https://www.cnbc.com/2025/10/14/theres-a-shocking-disparity-between-how-high-income-and-low-incom...
1•rntn•36m ago•1 comments