Medical bills contain diagnosis codes. Diagnosis codes reveal conditions. We decided no patient should have to send that to a server just to check if they're being overcharged.
So we built a bill analyzer where everything runs in the browser: Tesseract OCR, code extraction, pricing lookups against Medicare fee schedules, and 3.3M CMS bundling rule checks. Zero network calls after initial load.
The hard problem was size. Raw CMS datasets run to tens of megabytes. We shard so first load is 198KB (479x reduction), detail shards on demand. Zod validation with fail-closed defaults: if data fails schema checks, the feature turns off rather than showing bad numbers.
12 sprints to get OCR to 95.0% F1 across 19 real bills. The failure modes are specific to medical documents: thermal printer ink where $45 becomes $4,500, layouts where every code shifts one column right, ZIP codes in headers extracted as charge amounts. We built a 7-stage filter pipeline to catch these before they reach the pricing engine.
The bundling checks are exhaustive. If a hospital bills code A and code B separately, but CMS says B is included in A, that's an unbundling violation. Most audit tools run this server-side. We load all 3.3M pairs into the browser via sharded JSON and in-memory indexing.
harvey9•1h ago
The link is to an article. Would it be better to link to the software?
vivzkestrel•48m ago
how many americans out there resort to medical tourism as a viable alternative to beat hospital costs? any numbers?
abdusm•1h ago
So we built a bill analyzer where everything runs in the browser: Tesseract OCR, code extraction, pricing lookups against Medicare fee schedules, and 3.3M CMS bundling rule checks. Zero network calls after initial load. The hard problem was size. Raw CMS datasets run to tens of megabytes. We shard so first load is 198KB (479x reduction), detail shards on demand. Zod validation with fail-closed defaults: if data fails schema checks, the feature turns off rather than showing bad numbers.
12 sprints to get OCR to 95.0% F1 across 19 real bills. The failure modes are specific to medical documents: thermal printer ink where $45 becomes $4,500, layouts where every code shifts one column right, ZIP codes in headers extracted as charge amounts. We built a 7-stage filter pipeline to catch these before they reach the pricing engine.
The bundling checks are exhaustive. If a hospital bills code A and code B separately, but CMS says B is included in A, that's an unbundling violation. Most audit tools run this server-side. We load all 3.3M pairs into the browser via sharded JSON and in-memory indexing.