I’ve built a tool called ExactStatement to help users convert PDF bank statements into specific CSV formats.
Currently, I’m using the Gemini API (Pro/Flash) to directly transform PDF content into structured JSON. While it works surprisingly well for 95% of cases, the "last 5%" is a headache:
Hallucinations: Occasionally, the AI misinterprets a digit or skips a line item, which is unacceptable for financial data.
Context Limits: Very long statements (50+ pages) sometimes lead to degraded performance or missing rows.
I'm looking for a more robust engineering approach. Should I:
Stick with LLMs but add a validation layer (e.g., checking if the calculated balance matches the statement's final balance)?
Switch to a hybrid approach? (e.g., using LayoutLM or Amazon Textract for OCR/Layout analysis first, then using LLMs for cleaning).
Go back to rule-based parsing for major banks (though maintaining templates seems like a nightmare)?
How are you guys solving the "precision" problem in document extraction today? Would love to hear your experiences with specific libraries or workflows.
alexfefun1•1h ago
Currently, I’m using the Gemini API (Pro/Flash) to directly transform PDF content into structured JSON. While it works surprisingly well for 95% of cases, the "last 5%" is a headache:
Hallucinations: Occasionally, the AI misinterprets a digit or skips a line item, which is unacceptable for financial data.
Context Limits: Very long statements (50+ pages) sometimes lead to degraded performance or missing rows.
I'm looking for a more robust engineering approach. Should I:
Stick with LLMs but add a validation layer (e.g., checking if the calculated balance matches the statement's final balance)?
Switch to a hybrid approach? (e.g., using LayoutLM or Amazon Textract for OCR/Layout analysis first, then using LLMs for cleaning).
Go back to rule-based parsing for major banks (though maintaining templates seems like a nightmare)?
How are you guys solving the "precision" problem in document extraction today? Would love to hear your experiences with specific libraries or workflows.