Step 1: Capture tables as images using a headless browser Step 2: Run OCR to convert them into structured JSON
This works well when traditional HTML parsers fail, like for complex styles, merged cells, or JS-rendered content.
GitHub: https://github.com/enterpriseqa/extract_tables_from_websites Examples included. Feedback and contributions are welcome!