Current stack: – Azure Document Intelligence (prebuilt) – preprocessing (PyMuPDF → image → filters) – a multimodal LLM to turn the detected table into clean JSON
Main issue: – To localize the table, I currently rely on template-specific configs. At scale, this becomes unmanageable because there may be hundreds of unique layouts.
Has anyone solved this class of problem? Looking for strategies for: – robust table localization across many templates, – hybrid rule-based + ML approaches, – layout-based detection, – or “templateless” methods that generalize better.