I built an API to stop manual data entry from invoices and resumes

2•scannyai•1mo ago

Hi HN,

I’m the founder of Scanny AI (https://scanny-ai.com/).

I built this because I noticed that despite all the advancements in AI, businesses are still hiring people to manually copy-paste data from PDFs to Excel. Standard OCR tools often just give you a "blob of text" that still requires manual cleanup.

What it does: Scanny AI takes unstructured documents (Invoices, Resumes, IDs, Receipts) and extracts specific data points into structured formats (JSON, CSV, Excel).

How it works: Unlike regex-based parsers or standard OCR, we use context-aware models to understand the document layout. This means it can identify a "Total Amount" on an invoice even if the layout changes, or extract "Implied Skills" from a CV that aren't explicitly listed as keywords.

Current Use Cases:

Invoices: Extracting line items, tax, and vendor details.

Resumes: Parsing experience and skills for HR.

IDs: extracting PII for KYC checks.

We are currently in Early Access and I’m looking for feedback on the extraction accuracy and the API usability.

I’ve enabled Free Credits for new sign-ups so you can test it on your own documents without paying.

I’d love to hear your thoughts on the edge cases (messy handwriting, weird layouts, etc.) and what features you’d like to see next.

Link: https://scanny-ai.com/

Thanks!

Comments

fuzzy_lumpkins•1mo ago

definitely going to pass this on to a couple friends who were just talking about vendor/sales data issues this past week.

scannyai•4w ago

Thanks a lot for the support, I'd be happy to support them and offer some free credits to try it.

jaredsohn•1mo ago

Why not just use a standard LLM prompt?

scannyai•4w ago

You absolutely can for prototypes, but at production scale, you'll hit major issues with cost, latency, and random JSON formatting errors. We handle the heavy lifting—optimizing the vision pipeline and enforcing strict schemas—so you don't have to build and maintain the glue code around the model yourself.

155M US land parcel boundaries

Private Inference

Font Rendering from First Principles

Show HN: Seedance 2.0 AI video generator for creators and ecommerce

Wally: A fun, reliable voice assistant in the shape of a penguin

Rewriting Pycparser with the Help of an LLM

Lobsters Vibecoding Challenge

E-Commerce vs. Social Commerce

Avoiding Modern C++ – Anton Mikhailov [video]

Show HN: AegisMind–AI system with 12 brain regions modeled on human neuroscience

Zig – Package Management Workflow Enhancements

AI-powered text correction for macOS

AppSecMaster – Learn Application Security with hands on challenges

Fibonacci Number Certificates

AI Overviews are killing the web search, and there's nothing we can do about it

City skylines need an upgrade in the face of climate stress

1979: The Model World of Robert Symes [video]

Satellites Have a Lot of Room

1980s Farm Crisis

Show HN: FSID - Identifier for files and directories (like ISBN for Books)

Show HN: Holy Grail: Open-Source Autonomous Development Agent

Show HN: Minecraft Creeper meets 90s Tamagotchi

Show HN: Termiteam – Control center for multiple AI agent terminals

The only U.S. particle collider shuts down

Ask HN: Why do purchased B2B email lists still have such poor deliverability?

Show HN: Remotion directory (videos and prompts)

Portable C Compiler

Show HN: Kokki – A "Dual-Core" System Prompt to Reduce LLM Hallucinations

Software Engineering Transformation 2026

Microsoft purges Win11 printer drivers, devices on borrowed time