Main features: - Text extraction (full document or specific pages) - Table extraction → JSON with headers/rows - Invoice parsing → vendor, amounts, line items, tax (auto language detection) - Resume parsing → contact info, skills, work experience, education - Metadata, links, embedded images extraction
API endpoint: https://pdfpull-895295000838.europe-west1.run.app/docs Playground: https://bnacar.dev/pdfpull-landing/playground.html
Tech: FastAPI, PyMuPDF, pdfplumber. Rule-based extraction (no LLM API calls).
The invoice/resume parsers detect language automatically (EN/DE/TR) and extract fields without per-template configuration.
Demo key for testing: sk_demo_123456789
Example request:
curl -X POST "https://pdfpull-895295000838.europe-west1.run.app/api/v1/parse/invoice" \
-H "X-API-Key: sk_demo_123456789" \
-F "file=@invoice.pdf"
Returns structured JSON: {
"vendor_name": "ACME Corp",
"invoice_number": "INV-2024-001",
"total_amount": 1250.00,
"currency": "USD",
"line_items": [...],
"confidence": 0.92
}
Free to try (100 requests with demo key). Looking for feedback on the API design and what document types to add next.