Ask HN: Why are PDFs so hard to edit?

5•superconduct123•5h ago

What is it about the underlying format that makes it so difficult to edit a PDF

Comments

k310•4h ago

There's a pretty decent explanation here:

https://mailmergic.com/blog/why-pdf-are-hard-to-edit/

The most compelling tidbit I found was this:

> The Technical Architecture of PDF: A Labyrinth of Objects

> Beneath the surface, PDF files are complex compositions made up of objects: text blocks, images, vectors, fonts, metadata, and instructions for rendering. These elements are often stored in fragmented sequences that are optimized for viewing rather than editing. The text is not always stored in logical reading order, and words may be divided into separate character objects placed precisely on the page based on coordinates.

Lots more there. No more spoilers.

PaulHoule•4h ago

Maybe 10 years ago I was a student of file formats and I actually liked PDF as it had a clear theory of how you serialize a graph of objects. It's more like the old Microsoft Word format or the current DOCX and much better than the atrocious PSD format. PDF is a good format for one developed in the 1990s for what it was intended to do.

necovek•4h ago

Because it was designed as a graphical output format, not an editable format.

Some of the "compression" tricks it allows one to use (eg. font subsetting, even remapping characters to use fewer bits to encode text) may make the data only keep the same appearance, and semantic encoding would be gone (for example, "A" may stand for "#").

It's actually quite similar in nature to TeX's DVI format (boxes and their positions), though obviously not a bitmap format but a vector one with all the deps embedded.

This means that, for instance, using non-default kerning and whitespace will lead to all text becoming box-per-character thrown around the page.

superconduct123•3h ago

I see, so its like a lower level format than say a word doc or markdown

fuzzfactor•4h ago

>Why are PDFs so hard to edit?

This is by design.

IIRC the original objective was to require a costly proprietary program from Adobe called "Acrobat" to create the file to begin with, and it was intended not to be edited. Rather it was supposed to be readable and printable with good consistency between PCs and Macs.

"Acrobat Reader" has always been free, to help popularize the format and make sure that anybody could open and read the file. But no editing for you the user. And the "publishers" who routinely generated the early PDFs using the full Acrobat suite wanted to distribute documents for people to trust that they had not been edited from the source. At least not as easily as a Word DOC file could be edited.

Introduction to the A* Algorithm

10-HarmonyOS5-TextProcessingEntity-Case

09-TextProcessing-WordSegment-Case

Show HN: Open-Source Emoji Economy with Multi-Species Governance for Co-Creation

Python removes "experimental" tag from the "nogil" free-threaded Python

Short Ruby Newsletter – edition 140

IonQ's Accelerated Roadmap: Turning Quantum Ambition into Reality

Show HN: I built a tool to help you sell Digital Downloads via Stripe

SeedancePro AI: Effortlessly Convert Text and Images to Video

How bot detection misfires on non-mainstream browsers and privacy tools

Telegram Messenger's Ties to Russia's FSB Revealed in New Report

Show HN: Paygo.network – no subscription AI tools

End-to-End Encryption: Architecturally Necessary

AI Is Changing Work–and Indie Devs Feel It First

Gemini Flash 2.5, Imagen 4 and Veo 2 Chaining for Multi-modal Characters

Years of inactivity in "Pay or OK" cases: noyb sues German DPAs

Air India: Is There a Safest Seat in a Plane Crash?

Nodepass: Secure, efficient TCP/UDP tunneling solution

East German Uprising of 1953

Show HN: I created a Chrome Extension that improves X/Twitter

Show HN: Zentra – The AI Travel Planner

Making Playpen Sans

Show HN: A boilerplate for Kotlin Multiplatform to launch Android and iOS apps

Pentagon pizza monitor predicted 'busy night' ahead of Israel's attack on Iran

Phkmalloc

The Fediverse: A New Era of Social Media

The 'Superlative' Injunction: India's Pirate Site Blockades Go Next Level

Show HN: Taiyaki – AI-generated 3D jewelry from photos and sketches

Show HN: AI Text to Music Generator

John Carmack at Upper Bound 2025 [video]