Show HN: Replacing $50k manual forensic audits with a deterministic .py engine

2•cd_mkdir•2h ago

I’m a software architect, and I recently built Exit Protocol (https://exitprotocols.com), an automated forensic accounting engine for high-conflict litigation.

Problem: If you get divorced and need to prove that a specific $250k in a heavily commingled joint bank account is your "separate property" (e.g., from a pre-marital startup exit), the burden of proof is strictly mathematical. Historically, this meant paying a forensic CPA $500/hour to dump years of blurry bank PDFs into Excel and manually trace every dollar. It takes weeks and routinely costs over $50,000.

I looked at the legal standard courts use for this—the Lowest Intermediate Balance Rule (LIBR)—and realized it wasn’t an accounting problem. It is a Distributed Systems state-machine problem.

Why we didn't just "Throw AI at it"?

There are a hundred legal-tech startups right now trying to use LLMs to summarize bank data. In a courtroom, GenAI is a fatal liability. If an LLM hallucinates a single transaction, the entire ledger is inadmissible under the Daubert standard.

To make this court-ready, we had to build a strictly deterministic pipeline:

1. Vision-Native Ingestion (Beating Tesseract) Bank statements are the final boss of OCR (merged cells, overlapping debit/credit columns). Standard linear OCR fails catastrophically. We built a spatial-grid OCR pipeline (using Azure Document Intelligence with a local Surya OCR fallback) that maps the geometric structure of the page. It reconstructs tabular ledgers perfectly, even from multi-generational "PDFs from hell."

2. The Deterministic Engine (LIBR) The LIBR algorithm acts as a one-way ratchet. If an account balance drops below your separate property claim amount, your claim is permanently capped at that new floor. Subsequent marital deposits do not refill it (the "replenishment fallacy"). The engine replays thousands of transactions chronologically, continuously evaluating S_t = min(S_t-1, B_t).

3. Resolving Timestamp Ambiguity Bank PDFs give you dates, not timestamps. If a $10k deposit and $10k withdrawal happen on the same day, order matters. We built a simulation toggle that forces "Worst Case" (withdrawals process first) vs "Best Case" sorting, establishing a mathematically irrefutable "Zone of Truth" for settlement negotiations.

4. Cryptographic Chain of Custody & Sovereign Mode Lawyers are terrified of cloud SaaS breaches. We containerized the entire monolith (Django 5.0/Postgres/Celery) via Docker so enterprise firms can run it air-gapped on their own hardware (Sovereign Mode). Furthermore, every generated PDF dossier is sealed with a SHA-256 hash of the underlying data snapshot, proving to a judge that the output hasn't been tampered with since generation.

If you want to see the math in action, we set up a "Demo Sandbox" populated with a synthetic, highly complex 3-year commingled ledger. You can run the engine yourself here (Desktop recommended): https://exitprotocols.com/simulation/uplink/

Here is the exact "Attorney Work Product" it generates from raw PDF or Forensic Audit Dossier our system generates- https://exitprotocols.com/static/documents/Forensic_Audit_Sa...

I'd love feedback from the HN crowd on the architecture—specifically handling edge-case data ingestion and maintaining cryptographic integrity in B2B enterprise deployments.

Cheers!

Comments

cd_mkdir•2h ago

Not a lawyer, so the Go-To-Market side in the legal space has been a steep learning curve. If anyone here has experience selling/deploying air-gapped, on-prem solutions to highly risk-averse, non-technical clients (like law firms), I would love to hear your battle stories.

Happy to answer any questions about the math, the OCR pipeline, or the architecture!

Sandbox link again: https://exitprotocols.com/simulation/uplink/

Show HN: Han – A Korean programming language written in Rust

Show HN: Ichinichi – One note per day, E2E encrypted, local-first

Show HN: GitAgent – An open standard that turns any Git repo into an AI agent

Show HN: Learn Arabic with spaced repetition and comprehensible input

Show HN: Costly – Open-source SDK that audits your LLM API costs

Show HN: Replacing $50k manual forensic audits with a deterministic .py engine

Show HN: AI coding agent for VS Code with pay-as-you-go pricing- no subscription

Show HN: ZaneOps, A beautiful and fast self hosted alternative to Vercel

Show HN: ngrep – grep plus word embeddings (Rust)

Show HN: Data-anim – Animate HTML with just data attributes

Show HN: Cloak – send and receive secrets from OpenClaw

Show HN: Json.express – Query and explore JSON in the browser, zero dependencies

Show HN: Pidrive – File storage for AI agents (mount S3, use ls/cat/grep)

Show HN: Ink – Deploy full-stack apps from AI agents via MCP or Skills

Show HN: Paperctl- An Arxiv CLI designed for agents

Show HN: KeyID – Free email and phone infrastructure for AI agents (MCP)

Show HN: Language Life – Learn a language by living a simulated life

Show HN: Channel Surfer – Watch YouTube like it’s cable TV

Show HN: Context Gateway – Compress agent context before it hits the LLM

Show HN: I built Wool, a lightweight distributed Python runtime

Show HN: Zap Code – AI code generator that teaches kids real HTML/CSS/JS

Show HN: Auto-Save Claude Code Sessions to GitHub Projects

Show HN: What was the world listening to? Music charts, 20 countries (1940–2025)

Show HN: Axe – A 12MB binary that replaces your AI framework

Show HN: Hedra – an open-world 3D game I wrote from scratch before LLMs

Show HN: SupplementDEX – The Evidence-Based Supplement Database

Show HN: OneCLI – Vault for AI Agents in Rust

Show HN: BirdDex – Pokémon Go, but with real life birds

Show HN: QKD eavesdropper detector using Krylov complexity-open source Python

Show HN: Got tired of AI copilots just autocompleting, and built Glass Arc

Show HN: Replacing $50k manual forensic audits with a deterministic .py engine

Comments

Show HN: Han – A Korean programming language written in Rust

Show HN: Ichinichi – One note per day, E2E encrypted, local-first

Show HN: GitAgent – An open standard that turns any Git repo into an AI agent

Show HN: Learn Arabic with spaced repetition and comprehensible input

Show HN: Costly – Open-source SDK that audits your LLM API costs

Show HN: Replacing $50k manual forensic audits with a deterministic .py engine

Show HN: AI coding agent for VS Code with pay-as-you-go pricing- no subscription

Show HN: ZaneOps, A beautiful and fast self hosted alternative to Vercel

Show HN: ngrep – grep plus word embeddings (Rust)

Show HN: Data-anim – Animate HTML with just data attributes

Show HN: Cloak – send and receive secrets from OpenClaw

Show HN: Json.express – Query and explore JSON in the browser, zero dependencies

Show HN: Pidrive – File storage for AI agents (mount S3, use ls/cat/grep)

Show HN: Ink – Deploy full-stack apps from AI agents via MCP or Skills

Show HN: Paperctl- An Arxiv CLI designed for agents

Show HN: KeyID – Free email and phone infrastructure for AI agents (MCP)

Show HN: Language Life – Learn a language by living a simulated life

Show HN: Channel Surfer – Watch YouTube like it’s cable TV

Show HN: Context Gateway – Compress agent context before it hits the LLM

Show HN: I built Wool, a lightweight distributed Python runtime

Show HN: Zap Code – AI code generator that teaches kids real HTML/CSS/JS

Show HN: Auto-Save Claude Code Sessions to GitHub Projects

Show HN: What was the world listening to? Music charts, 20 countries (1940–2025)

Show HN: Axe – A 12MB binary that replaces your AI framework

Show HN: Hedra – an open-world 3D game I wrote from scratch before LLMs

Show HN: SupplementDEX – The Evidence-Based Supplement Database

Show HN: OneCLI – Vault for AI Agents in Rust

Show HN: BirdDex – Pokémon Go, but with real life birds

Show HN: QKD eavesdropper detector using Krylov complexity-open source Python

Show HN: Got tired of AI copilots just autocompleting, and built Glass Arc