I really dislike it when this happens. This also affects copy/pasting. This typically seems to happen with LaTeX-style two-column layouts, where columns are supposed to be read top to bottom, left to right, but tools end up reading paragraphs from left to right, top to bottom. It's infuriating.
PDFs suck. And it's awful that they're the least bad option for a lot of things.
gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 \
-dPDFSETTINGS=/ebook -dNOPAUSE -dQUIET -dBATCH \
-sOutputFile=output.pdf input.pdfWhile reading the article I could only think that all this semantic stuff is what html is about!
So, I think it makes more sense to do what arxiv is doing: providing a html version of articles on top of pdfs. I’d even say html should be the source and the PDF should be generated from it instead.
It was more of a genuine question, if it can be useful for machines while not being "visible". This thinking is a slippery slope though, because it can be stretched to a point where it defeats the original purpose.
Thing is people want to do bunch of things they shouldn’t with PDF like automated parsing, editing or adding forms to it.
Ideally you should have an API or other structured data to pass around but of course life is more complicated. Like PDF is all you get because API would cost more than it makes sense to do bad job parsing PDF.
https://github.com/ading2210/linuxpdf
/s
This is obviously absurd, and we don't know what you really mean. Probably billions of people use PDFs; I expect hundreds of millions use them regularly. I use them all the time, no problem, they work great.
PDFs are also a rare format which is preserved and functions reliably over time (decades) over systems (just about anything you can name). If I have a document I want to read 10-20 or more years in the future, PDF is the best bet.
The far superior presentation of professionally prepared PDFs - layout, typography, formatting, etc - makes a large difference for me when reading long texts. Also, the markup works very well and is also preserved - I can read markup from entirely different systems going back decades, and the annotation I make today I can read in 2050.
Wondering why they omitted that information from a blog post promoting Typst for accessibility use cases...
#show strong: set text(fill: blue)
Join us for a David Lynch double
feature with *Mulholland Drive* and
*Inland Empire* next Tuesday
at 8:15 PM.
I didn't realize Typst mixes content and presentation. Presumably, Typst allows including styles from external sources, much like the HTML/CSS split?ConTeXt has also been making strides in creating accessible PDF files:
* https://meeting.contextgarden.net/2024/talks/hans+mikael/con...
* https://wiki.contextgarden.net/Input_and_compilation/Accessi...
In my time using Typst, I found that Typst makes it possible/easy to make content even more abstract: write the content as a "data structure" and then present parts of it in various places around your document. For instance to list quantity/weight of a parts description in a parts index at the end.
Exactly: If instructions for how to style the content are in the same file as the content, then that is mixing content _with_ presentation logic. Avoiding this approach to documentation is what I alluded to in writing, "Presumably, Typst allows including styles from external sources."
ozim•2mo ago
I have smaller phone even but layout was good an spacing was great.
Black text on white background not some grayish to look different but perfectly legible.