Problem: If you get divorced and need to prove that a specific $250k in a heavily commingled joint bank account is your "separate property" (e.g., from a pre-marital startup exit), the burden of proof is strictly mathematical. Historically, this meant paying a forensic CPA $500/hour to dump years of blurry bank PDFs into Excel and manually trace every dollar. It takes weeks and routinely costs over $50,000.
I looked at the legal standard courts use for this—the Lowest Intermediate Balance Rule (LIBR)—and realized it wasn’t an accounting problem. It is a Distributed Systems state-machine problem.
Why we didn't just "Throw AI at it"?
There are a hundred legal-tech startups right now trying to use LLMs to summarize bank data. In a courtroom, GenAI is a fatal liability. If an LLM hallucinates a single transaction, the entire ledger is inadmissible under the Daubert standard.
To make this court-ready, we had to build a strictly deterministic pipeline:
1. Vision-Native Ingestion (Beating Tesseract) Bank statements are the final boss of OCR (merged cells, overlapping debit/credit columns). Standard linear OCR fails catastrophically. We built a spatial-grid OCR pipeline (using Azure Document Intelligence with a local Surya OCR fallback) that maps the geometric structure of the page. It reconstructs tabular ledgers perfectly, even from multi-generational "PDFs from hell."
2. The Deterministic Engine (LIBR) The LIBR algorithm acts as a one-way ratchet. If an account balance drops below your separate property claim amount, your claim is permanently capped at that new floor. Subsequent marital deposits do not refill it (the "replenishment fallacy"). The engine replays thousands of transactions chronologically, continuously evaluating S_t = min(S_t-1, B_t).
3. Resolving Timestamp Ambiguity Bank PDFs give you dates, not timestamps. If a $10k deposit and $10k withdrawal happen on the same day, order matters. We built a simulation toggle that forces "Worst Case" (withdrawals process first) vs "Best Case" sorting, establishing a mathematically irrefutable "Zone of Truth" for settlement negotiations.
4. Cryptographic Chain of Custody & Sovereign Mode Lawyers are terrified of cloud SaaS breaches. We containerized the entire monolith (Django 5.0/Postgres/Celery) via Docker so enterprise firms can run it air-gapped on their own hardware (Sovereign Mode). Furthermore, every generated PDF dossier is sealed with a SHA-256 hash of the underlying data snapshot, proving to a judge that the output hasn't been tampered with since generation.
If you want to see the math in action, we set up a "Demo Sandbox" populated with a synthetic, highly complex 3-year commingled ledger. You can run the engine yourself here (Desktop recommended): https://exitprotocols.com/simulation/uplink/
Here is the exact "Attorney Work Product" it generates from raw PDF or Forensic Audit Dossier our system generates- https://exitprotocols.com/static/documents/Forensic_Audit_Sa...
I'd love feedback from the HN crowd on the architecture—specifically handling edge-case data ingestion and maintaining cryptographic integrity in B2B enterprise deployments.
Cheers!
cd_mkdir•2h ago
Happy to answer any questions about the math, the OCR pipeline, or the architecture!
Sandbox link again: https://exitprotocols.com/simulation/uplink/