frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Ask HN: Why are PDFs so hard to edit?

5•superconduct123•7mo ago
What is it about the underlying format that makes it so difficult to edit a PDF

Comments

k310•7mo ago
There's a pretty decent explanation here:

https://mailmergic.com/blog/why-pdf-are-hard-to-edit/

The most compelling tidbit I found was this:

> The Technical Architecture of PDF: A Labyrinth of Objects

> Beneath the surface, PDF files are complex compositions made up of objects: text blocks, images, vectors, fonts, metadata, and instructions for rendering. These elements are often stored in fragmented sequences that are optimized for viewing rather than editing. The text is not always stored in logical reading order, and words may be divided into separate character objects placed precisely on the page based on coordinates.

Lots more there. No more spoilers.

PaulHoule•7mo ago
Maybe 10 years ago I was a student of file formats and I actually liked PDF as it had a clear theory of how you serialize a graph of objects. It's more like the old Microsoft Word format or the current DOCX and much better than the atrocious PSD format. PDF is a good format for one developed in the 1990s for what it was intended to do.
necovek•7mo ago
Because it was designed as a graphical output format, not an editable format.

Some of the "compression" tricks it allows one to use (eg. font subsetting, even remapping characters to use fewer bits to encode text) may make the data only keep the same appearance, and semantic encoding would be gone (for example, "A" may stand for "#").

It's actually quite similar in nature to TeX's DVI format (boxes and their positions), though obviously not a bitmap format but a vector one with all the deps embedded.

This means that, for instance, using non-default kerning and whitespace will lead to all text becoming box-per-character thrown around the page.

superconduct123•7mo ago
I see, so its like a lower level format than say a word doc or markdown
necovek•7mo ago
I wouldn's say that: it's really a "drawing" format for paged documents, and not a text format.

There is a concept of accessible PDFs, where care is taken during generation to make it as semantic as possible for screen readers etc. Editing those is usually much simpler.

The fact that text-editing tools are a common source for PDFs would mean that those text-editing tools are really where you should edit the source and regenerate.

It's like asking why are compiled binaries or minified JS so hard to edit?

fuzzfactor•7mo ago
>Why are PDFs so hard to edit?

This is by design.

IIRC the original objective was to require a costly proprietary program from Adobe called "Acrobat" to create the file to begin with, and it was intended not to be edited. Rather it was supposed to be readable and printable with good consistency between PCs and Macs.

"Acrobat Reader" has always been free, to help popularize the format and make sure that anybody could open and read the file. But no editing for you the user. And the "publishers" who routinely generated the early PDFs using the full Acrobat suite wanted to distribute documents for people to trust that they had not been edited from the source. At least not as easily as a Word DOC file could be edited.

Show HN: Holy Grail: Open-Source Autonomous Development Agent

https://github.com/dakotalock/holygrailopensource
1•Moriarty2026•7m ago•1 comments

Show HN: Minecraft Creeper meets 90s Tamagotchi

https://github.com/danielbrendel/krepagotchi-game
1•foxiel•14m ago•1 comments

Show HN: Termiteam – Control center for multiple AI agent terminals

https://github.com/NetanelBaruch/termiteam
1•Netanelbaruch•14m ago•0 comments

The only U.S. particle collider shuts down

https://www.sciencenews.org/article/particle-collider-shuts-down-brookhaven
1•rolph•17m ago•1 comments

Ask HN: Why do purchased B2B email lists still have such poor deliverability?

1•solarisos•17m ago•2 comments

Show HN: Remotion directory (videos and prompts)

https://www.remotion.directory/
1•rokbenko•19m ago•0 comments

Portable C Compiler

https://en.wikipedia.org/wiki/Portable_C_Compiler
2•guerrilla•21m ago•0 comments

Show HN: Kokki – A "Dual-Core" System Prompt to Reduce LLM Hallucinations

1•Ginsabo•22m ago•0 comments

Software Engineering Transformation 2026

https://mfranc.com/blog/ai-2026/
1•michal-franc•23m ago•0 comments

Microsoft purges Win11 printer drivers, devices on borrowed time

https://www.tomshardware.com/peripherals/printers/microsoft-stops-distrubitng-legacy-v3-and-v4-pr...
3•rolph•24m ago•1 comments

Lunch with the FT: Tarek Mansour

https://www.ft.com/content/a4cebf4c-c26c-48bb-82c8-5701d8256282
2•hhs•27m ago•0 comments

Old Mexico and her lost provinces (1883)

https://www.gutenberg.org/cache/epub/77881/pg77881-images.html
1•petethomas•30m ago•0 comments

'AI' is a dick move, redux

https://www.baldurbjarnason.com/notes/2026/note-on-debating-llm-fans/
4•cratermoon•31m ago•0 comments

The source code was the moat. But not anymore

https://philipotoole.com/the-source-code-was-the-moat-no-longer/
1•otoolep•32m ago•0 comments

Does anyone else feel like their inbox has become their job?

1•cfata•32m ago•1 comments

An AI model that can read and diagnose a brain MRI in seconds

https://www.michiganmedicine.org/health-lab/ai-model-can-read-and-diagnose-brain-mri-seconds
2•hhs•35m ago•0 comments

Dev with 5 of experience switched to Rails, what should I be careful about?

1•vampiregrey•37m ago•0 comments

AlphaFace: High Fidelity and Real-Time Face Swapper Robust to Facial Pose

https://arxiv.org/abs/2601.16429
1•PaulHoule•38m ago•0 comments

Scientists discover “levitating” time crystals that you can hold in your hand

https://www.nyu.edu/about/news-publications/news/2026/february/scientists-discover--levitating--t...
2•hhs•40m ago•0 comments

Rammstein – Deutschland (C64 Cover, Real SID, 8-bit – 2019) [video]

https://www.youtube.com/watch?v=3VReIuv1GFo
1•erickhill•41m ago•0 comments

Tell HN: Yet Another Round of Zendesk Spam

4•Philpax•41m ago•0 comments

Postgres Message Queue (PGMQ)

https://github.com/pgmq/pgmq
1•Lwrless•45m ago•0 comments

Show HN: Django-rclone: Database and media backups for Django, powered by rclone

https://github.com/kjnez/django-rclone
2•cui•47m ago•1 comments

NY lawmakers proposed statewide data center moratorium

https://www.niagara-gazette.com/news/local_news/ny-lawmakers-proposed-statewide-data-center-morat...
2•geox•49m ago•0 comments

OpenClaw AI chatbots are running amok – these scientists are listening in

https://www.nature.com/articles/d41586-026-00370-w
3•EA-3167•49m ago•0 comments

Show HN: AI agent forgets user preferences every session. This fixes it

https://www.pref0.com/
6•fliellerjulian•51m ago•0 comments

Introduce the Vouch/Denouncement Contribution Model

https://github.com/ghostty-org/ghostty/pull/10559
2•DustinEchoes•53m ago•0 comments

Show HN: SSHcode – Always-On Claude Code/OpenCode over Tailscale and Hetzner

https://github.com/sultanvaliyev/sshcode
1•sultanvaliyev•54m ago•0 comments

Microsoft appointed a quality czar. He has no direct reports and no budget

https://jpcaparas.medium.com/microsoft-appointed-a-quality-czar-he-has-no-direct-reports-and-no-b...
3•RickJWagner•55m ago•0 comments

Multi-agent coordination on Claude Code: 8 production pain points and patterns

https://gist.github.com/sigalovskinick/6cc1cef061f76b7edd198e0ebc863397
1•nikolasi•56m ago•0 comments