My personal experience relates to the statement from the article, "According to several studies, approximately 80–90 percent of the world's organizational data is stored as unstructured data in documents, much of it locked away in formats that resist easy extraction."
In my experience, this means Microsoft Word and PowerPoint documents authored by people who put more focus on the appearance than the structure of the content. Take one of these documents and generate PDF from it and any hint of structure that existed is gone.
There was an article on HN not too long ago discussing this history of Word documents and the lack of structure, but I can't find the link. ETA: https://ia.net/topics/markdown-and-the-slow-fade-of-the-form...
bediger4000•2h ago
The tellers are magnificently ignorant about this, as is the telephone helpline. To them, the PDF actually is the data the credit union uses. No other form of data exists, except possibly in an Excel spreadsheet, and they can't give data in that format. I blame the prevalence of Windows for this. Between the use of file name "extension" to indicate format of the file, hiding the "extension" in file browsers, the single document at a time orientation, and almost exclusive use of WYSIWYG systems like Word and Excel, it's pretty hard to understand that a difference between "the data" and "the formatting" exists.