Show HN: Parsley - Open-source AI parser for PDFs and images to JSON/CSV

1•wastu•1mo ago

Comments

wastu•1mo ago

I got tired of building custom OCR pipelines for every document type and fixing parsers whenever a PDF layout changed. So I built Parsley.

Parsley uses LLMs to parse PDFs or images into structured data. You define the fields you want and it extracts them by meaning rather than fixed positions or labels.

The web app uses your own API keys with providers like Google or OpenRouter. Documents go directly to them. The app is stateless and doesn't store files or keys.

Features:

- Custom or auto-generated schemas

- PDF (including password-protected) and image support - Multiple LLM providers

- JSON or CSV output

- Stateless API for n8n or Zapier

- Free, rate-limited demo mode

I use it in n8n to parse invoices and bank statements straight into spreadsheets. It's been much less brittle than traditional OCR.

Code: https://github.com/bgwastu/parsley Demo (rate-limited): https://parsley.wastu.net/

Would genuinely appreciate thoughts from people who've had to wrangle documents like this.