Some of the "compression" tricks it allows one to use (eg. font subsetting, even remapping characters to use fewer bits to encode text) may make the data only keep the same appearance, and semantic encoding would be gone (for example, "A" may stand for "#").
It's actually quite similar in nature to TeX's DVI format (boxes and their positions), though obviously not a bitmap format but a vector one with all the deps embedded.
This means that, for instance, using non-default kerning and whitespace will lead to all text becoming box-per-character thrown around the page.
This is by design.
IIRC the original objective was to require a costly proprietary program from Adobe called "Acrobat" to create the file to begin with, and it was intended not to be edited. Rather it was supposed to be readable and printable with good consistency between PCs and Macs.
"Acrobat Reader" has always been free, to help popularize the format and make sure that anybody could open and read the file. But no editing for you the user. And the "publishers" who routinely generated the early PDFs using the full Acrobat suite wanted to distribute documents for people to trust that they had not been edited from the source. At least not as easily as a Word DOC file could be edited.
k310•4h ago
https://mailmergic.com/blog/why-pdf-are-hard-to-edit/
The most compelling tidbit I found was this:
> The Technical Architecture of PDF: A Labyrinth of Objects
> Beneath the surface, PDF files are complex compositions made up of objects: text blocks, images, vectors, fonts, metadata, and instructions for rendering. These elements are often stored in fragmented sequences that are optimized for viewing rather than editing. The text is not always stored in logical reading order, and words may be divided into separate character objects placed precisely on the page based on coordinates.
Lots more there. No more spoilers.
PaulHoule•4h ago