I'm the solo developer behind Baguno. I built this because I saw logistics and finance teams wasting hours manually copying data from messy PDF invoices into their ERPs.
The core premise is simple: you get a dedicated email address, forward your invoices to it, and the data is extracted, math-checked, and synced to your ledger (Zoho, QB, Xero) in under a minute.
The Architecture & Challenges:
Building a reliable extraction engine that handles everything from perfectly formatted SaaS receipts to messy, scanned freight bills was the hardest part.
Primary Engine: I'm using Azure Document Intelligence for the heavy lifting (great at complex table structures).
The Fallback: If Azure misses the grand total or fails on a complex layout, the backend automatically falls back to an OpenAI Vision model (gpt-4o-mini) to re-process the image and extract the missing fields.
Security: Multi-tenant isolation in Google Cloud Storage ([org_id]/invoices/...).
We have a free tier (20 lifetime documents) if you want to test the extraction engine yourself. You can just sign up with Google Auth. I’d love to hear your feedback on the speed, accuracy, or any edge-case PDFs that break the parser!
lakma•1h ago
I'm the solo developer behind Baguno. I built this because I saw logistics and finance teams wasting hours manually copying data from messy PDF invoices into their ERPs.
The core premise is simple: you get a dedicated email address, forward your invoices to it, and the data is extracted, math-checked, and synced to your ledger (Zoho, QB, Xero) in under a minute.
The Architecture & Challenges: Building a reliable extraction engine that handles everything from perfectly formatted SaaS receipts to messy, scanned freight bills was the hardest part.
Primary Engine: I'm using Azure Document Intelligence for the heavy lifting (great at complex table structures).
The Fallback: If Azure misses the grand total or fails on a complex layout, the backend automatically falls back to an OpenAI Vision model (gpt-4o-mini) to re-process the image and extract the missing fields.
Backend: Node.js, Express, Mongoose, Firebase Auth.
Security: Multi-tenant isolation in Google Cloud Storage ([org_id]/invoices/...).
We have a free tier (20 lifetime documents) if you want to test the extraction engine yourself. You can just sign up with Google Auth. I’d love to hear your feedback on the speed, accuracy, or any edge-case PDFs that break the parser!