The idea came from repeatedly writing boilerplate code to extract structured data from invoices, receipts, and other documents. Instead of wrestling with different API formats, I wanted a unified interface that:
- Extracts structured data using Zod/Pydantic schemas - Classifies and splits multi-section documents (e.g., medical records) - Processes documents in batches with automatic error handling - Works locally without APIs (for PDFs, DOCX, XLSX, etc.)
Key features: - Available for both TypeScript and Python - Batch processing with concurrent requests - Document classification (splits 100+ page docs by category) - Local parsers (no API needed for basic extraction) - Apache 2.0 licensed
Currently supports OpenAI, Mistral, Gemini, and Hugging Face. Planning to add Together AI, Anthropic, and more.
Would love feedback on the API design and what features would be most useful