Hey HN, solo dev here. After years of frustration with how LLMs handle complex documents, especially PDFs with tables, I decided to build a solution myself. My approach uses a Markdown conversion step to preserve the table structure, which seems to work surprisingly well for chunking.
This little parser is the first public piece of a much larger, privacy-focused AI platform I'm building. I'm pretty much running on fumes financially, so any feedback, critique, or support is massively appreciated.
Happy to answer any questions about the approach!