The war against PDFs is heating up

https://www.economist.com/business/2026/02/24/the-war-against-pdfs-is-heating-up

20•pseudolus•2h ago

Comments

pseudolus•2h ago

dhosek•2h ago

Well, that was a nonsense article. Badly written software has trouble with PDFs, accessibility is an afterthought (which, sadly, is true of most things) and some small group thinks they can invent a better wheel, ignoring the fact that they’d have to do a lot of work to overcome the first mover advantages of HTML and PDF and this comment now has more information than the original article thanks to that clause beginning with “ignoring”.

pavel_lishin•2h ago

> Yet Duff Johnson, head of the PDF Association, protector of the format, argues that the fault lies not in the file type but in ourselves. He contends that there is no reason developers cannot build bots that are able to use PDFs. The AI assistant embedded in Acrobat, Adobe’s PDF reader, is designed to do precisely that, notes Leonard Rosenthol, the software firm’s PDF guru.

Designed to, but does it do it well without the problems noted earlier in the article?

ssl-3•1h ago

Strictly anecdotally, I've had no trouble feeding PDFs to OpenAI's bot.

The searchable PDFs get searched, and the just-pictures-of-words ones get fed through their (quite good, IMHO) OCR.

I use it all the time. It's remarkably good for locating the details I need in the poorly-organized ~1,200 page factory manual for my Honda.

(Well, it's not necessarily organized poorly. It's just designed with the clear intent that it is mostly to serve as a set of repair instructions, and sometimes I don't want repair instructions. Sometimes I want to know how a thing works for my own cognitive benefit instead of how diagnose and R&R it as a series of steps.)

cyberax•1h ago

I'm using paperless-ngx for personal document management, and Claude Desktop was able to read and OCR all the PDFs there just fine (through an MCP connector).

It also was able to parse my tax forms in 3 languages.

barrister•2h ago

Seems to be a weak pitch for an Israeli startup called Factify. Their new document type is also closed sourced which seems like an obvious showstopper for a ubiquitous global document replacement, especially in today's extremely heated and untrustworthy environment.

No strong argument imo for replacing the pdf.

g947o•32m ago

It weirdly reminds me of SynthID/C2PA. At the end of the day, they matter very little. People are going to do what they want to do.

If people want to manage version/access etc, they are going to do it right the first time with existing document format and permission control mechanism, ranging from "making rhe document only accessible to certain users" to "have someone read a document in a specific room", which has worked reasonably well.

cratermoon•1h ago

There are PDF files and there are PDF files. Many (most?) PDFs I run into are generated from Microsoft Word or some other MS product with no structure at all. The majority of people use MS products don't understand or care about structure. The WYSIWYG imperative means lots of markup to describe font size, color, and decoration, to make every section heading look the same without ever designating the text as a section head. The same happens with paragraphs, page breaks, and column flow. The resulting document looks correct enough to the creator. Other people who have a different version of Word, different fonts, and a thousand other little differences, won't see it correctly. That leads our author to generate a PDF, probably with embedded fonts, to ensure uniform appearance across these thousand little exceptions.

The result is a document with the content mixed up so incomprehensibly with appearance controls as to be both unreadable and without any residue of the underlying intended structure of the document's sections, headers, figures, paragraphs, captions, footnotes, or anything.

And then there's PDF files which are nothing more than a series of images of pages of text. If you're lucky and the scans are clean a good OCR might be able to recover most of the content.

What I'm saying is, it doesn't matter the tool, if authors don't encode structure and formatting in semantically meaningful ways.

tpm•1h ago

So what you are actually saying is that there is a market for a tool that will recreate the PDF with a structure based on how the original PDF looks?

cratermoon•1h ago

The market has been needing a tool like that for 30 years. A PDF document of the type I describe is like a broken egg. Information is lost between the authoring and rendering, to the extent that it's not clear recreating the original is even possible.

pessimizer•1h ago

A typesetter could recreate the document through looking at it, doing some font research, and playing with the kerning for a while. Saying it's not possible to recreate a typeset document that is readable is absurd, no matter how twisted and insane the actual postscript is.

ur-whale•1h ago

https://archive.is/aCleq

Gualdrapo•1h ago

Makes me remember of this, which was posted a few days ago here in HN:

https://scottlocklin.wordpress.com/2023/05/31/djvu-and-its-c...

pessimizer•1h ago

The war against pdfs is based on AI being too stupid to read them? That's a condemnation of AI, not pdfs. I, a natural intelligence, can easily read pdfs.

Cheyana•46m ago

Perfect response.

lsbehe•1h ago

I'll miss getting documentation as a pile of pictures in a PDF.

maxloh•1h ago

For context, here is the startup's website: https://www.factify.com/. The site consists of only two main pages: the landing page and a "careers" section.

Based on the site, the service appears to be little more than a document hosting platform with tracking features, such as monitoring who copied the document and the specific paragraphs they selected. They’ve intentionally omitted a download feature to prevent access to outdated versions, but otherwise, the experience seems no different from an ordinary PDF reader.

There is no mention of a "new standard" on their front page. I suspect they don't actually convert the documents. They likely just convert pages to encrypted images and use client-side rendering for text elements to allow for selection and copying.

sghaz•57m ago

This looks like an sponsored article. Very poor quality.

g947o•37m ago

My biggest gripes:

* you cannot easily view a PDF in dark mode. Solutions do exist, but there are always some limitations

* poor experience reading on mobile device (mentioned in the article). You can use "Reflow" features provided by Acrobat or similar tools, but they often don't work offline, not to mention Acrobat is bloated and filled with dark patterns that trick you into buying a subscription

We do not think Anthropic should be designated as a supply chain risk

The Windows 95 user interface: A case study in usability engineering (1996)

MinIO Is Dead, Long Live MinIO

Obsidian Sync now has a headless client

The happiest I've ever been

Show HN: Xmloxide – an agent made rust replacement for libxml2

Block the “Upgrade to Tahoe” Alerts

Addressing Antigravity Bans and Reinstating Access

Building a Minimal Transformer for 10-digit Addition

Woxi: Wolfram Mathematica Reimplementation in Rust

Verified Spec-Driven Development (VSDD)

Qwen3.5 122B and 35B models offer Sonnet 4.5 performance on local computers

Deterministic Programming with LLMs

Werner Herzog Between Fact and Fiction

New evidence that Cantor plagiarized Dedekind?

Show HN: Now I Get It – Translate scientific papers into interactive webpages

The whole thing was a scam

The archivist preserving decaying floppy disks

MCP server that reduces Claude Code context consumption by 98%

747s and Coding Agents

Ghosts'n Goblins – “Worse danger is ahead”

From Noise to Image – interactive guide to diffusion

The Eternal Promise: A History of Attempts to Eliminate Programmers

Pentagon chief blocks officers from Ivy League schools and top universities

Unsloth Dynamic 2.0 GGUFs

The Future of AI

Our Agreement with the Department of War

What I learned while trying to build a production-ready nearest neighbor system

The United States and Israel have launched a major attack on Iran

'Play like a dog biting God's feet': Steven Isserlis on György Kurtág at 100