How to create accessible PDFs from the start

https://typst.app/blog/2025/accessible-pdf/

106•leephillips•3mo ago

Comments

ozim•2mo ago

Have to say I really liked reading the article on mobile.

I have smaller phone even but layout was good an spacing was great.

Black text on white background not some grayish to look different but perfectly legible.

elric•2mo ago

> It will be read in the wrong order

I really dislike it when this happens. This also affects copy/pasting. This typically seems to happen with LaTeX-style two-column layouts, where columns are supposed to be read top to bottom, left to right, but tools end up reading paragraphs from left to right, top to bottom. It's infuriating.

PDFs suck. And it's awful that they're the least bad option for a lot of things.

dvh•2mo ago

I've noticed a very lagy performance when reading specifically Texas instrument datasheets, I scroll 2 pages and bam 5s lag, then it usually works or lags occasionally. I passed the PDF through some gpt-concocted ghostscript woodoo and then they work just fine:

    gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 \
       -dPDFSETTINGS=/ebook -dNOPAUSE -dQUIET -dBATCH \
       -sOutputFile=output.pdf input.pdf

mmooss•2mo ago

How does that script change the PDF? Does -dPDFSETTINGS=/ebook do something? Is simply rewriting it in ghostscript making a difference?

dvh•2mo ago

I think the 1.4 does the most.

jiehong•2mo ago

TIL: PDF/UA is a thing!

While reading the article I could only think that all this semantic stuff is what html is about!

So, I think it makes more sense to do what arxiv is doing: providing a html version of articles on top of pdfs. I’d even say html should be the source and the PDF should be generated from it instead.

robin_reala•2mo ago

You won’t be able to generate semantic HTML from inaccessible PDF, that needs to be there from day one.

chanux•2mo ago

I wonder if some of these accessibility features help LLMs (Given the programs that process PDFs to feed LLMs account for those.)

lblume•2mo ago

Sure. But it also helps humans, and I'd guess currently more so.

chanux•2mo ago

Did my comment come in a negative tone?

It was more of a genuine question, if it can be useful for machines while not being "visible". This thinking is a slippery slope though, because it can be stretched to a point where it defeats the original purpose.

ethin•2mo ago

IMO PDFs should just be gone. Nobody should use them. They are a solution in search of a problem. The most common argument I hear is "well we need document fidelity!" But IMO this completely ignores the fact that this just isn't needed when we have digital signatures and a PKI and certificates and all that to prove that a document hasn't been tampered with. Making sure a document appears the same on any kind of device/OS or whatever would be a great idea in theory if the way it was done was actually thought through, but it wasn't and now the PDF format is even worse than HTML is (and that's really saying something). Every single time I have had to interact with a PDF it has always been a total disaster. Don't even get me started on the clusterfuck that is PDF forms.

ozim•2mo ago

PDF is fine as output format and for archiving.

Thing is people want to do bunch of things they shouldn’t with PDF like automated parsing, editing or adding forms to it.

Ideally you should have an API or other structured data to pass around but of course life is more complicated. Like PDF is all you get because API would cost more than it makes sense to do bad job parsing PDF.

wongarsu•2mo ago

The problem was "have documents that look the same on any device, including printed paper and computer screens", and the approach was "PostScript does that for printers, let's simplify it and make it more universal". Both the problem it's solving and the approach were fine, maybe even great. Since then over three decades have passed, pdf has gained a plethora of features, some less well thought out than others, and real-world requirements are completely different than they were in the early 90s. If we were to invent pdf today it would likely look completely different. But it's still good enough that it's hard for a new format to offer an advantage compelling enough to replace pdf.

ethin•2mo ago

Right, but that's what I'm getting at: PDF is just a terrible format all round. People do things with it that have nothing to do with document preservation. We have PDF forms, we have PDFs able to execute arbitrary JS (which can modify the rendering of the document, completely defeating the entire reason for the format existing)... Like IMO the format just has no reason to exist/be used anymore given how bloated and over-complicated it is.

ericpauley•2mo ago

That's why we have PDF/A: https://en.wikipedia.org/wiki/PDF/A

DHRicoF•2mo ago

Heck. I need any replacment to be at least equally capable of PDFs. The minimum I expect is for it to be able to run Linux in them.

https://github.com/ading2210/linuxpdf

mmooss•2mo ago

> Every single time I have had to interact with a PDF it has always been a total disaster.

This is obviously absurd, and we don't know what you really mean. Probably billions of people use PDFs; I expect hundreds of millions use them regularly. I use them all the time, no problem, they work great.

PDFs are also a rare format which is preserved and functions reliably over time (decades) over systems (just about anything you can name). If I have a document I want to read 10-20 or more years in the future, PDF is the best bet.

The far superior presentation of professionally prepared PDFs - layout, typography, formatting, etc - makes a large difference for me when reading long texts. Also, the markup works very well and is also preserved - I can read markup from entirely different systems going back decades, and the annotation I make today I can read in 2050.

miki123211•2mo ago

Typst doesn't (yet) do math accessibility I think, and Math is a lot of what it is about.

Wondering why they omitted that information from a blog post promoting Typst for accessibility use cases...

thangalin•2mo ago

    #show strong: set text(fill: blue)

    Join us for a David Lynch double
    feature with *Mulholland Drive* and
    *Inland Empire* next Tuesday
    at 8:15 PM.

I didn't realize Typst mixes content and presentation. Presumably, Typst allows including styles from external sources, much like the HTML/CSS split?

ConTeXt has also been making strides in creating accessible PDF files:

* https://meeting.contextgarden.net/2024/talks/hans+mikael/con...

* https://wiki.contextgarden.net/Input_and_compilation/Accessi...

klauserc•2mo ago

Isn't that example the exact opposite of mixing content and presentation? The * notation applies the strong [emphasis] tag, the show rule (re-)defines the presentation. Ideally you would of course separate the two into separate files (template + content).

In my time using Typst, I found that Typst makes it possible/easy to make content even more abstract: write the content as a "data structure" and then present parts of it in various places around your document. For instance to list quantity/weight of a parts description in a parts index at the end.

thangalin•2mo ago

> Ideally you would of course separate the two into separate files (template + content).

Exactly: If instructions for how to style the content are in the same file as the content, then that is mixing content _with_ presentation logic. Avoiding this approach to documentation is what I alluded to in writing, "Presumably, Typst allows including styles from external sources."

Ask HN: Codex 5.3 broke toolcalls? Opus 4.6 ignores instructions

Vectors and HNSW for Dummies

Sanskrit AI beats CleanRL SOTA by 125%

'Washington Post' CEO resigns after going AWOL during job cuts

Claude Opus 4.6 Fast Mode: 2.5× faster, ~6× more expensive

TSMC to produce 3-nanometer chips in Japan

Quantization-Aware Distillation

List of Musical Genres

Show HN: Sknet.ai – AI agents debate on a forum, no humans posting

University of Waterloo Webring

Large tech companies don't need heroes

Backing up all the little things with a Pi5

Game of Trees (Got)

Human Systems Research Submolt

The Threads Algorithm Loves Rage Bait

Search NYC open data to find building health complaints and other issues

Michael Pollan Says Humanity Is About to Undergo a Revolutionary Change

Show HN: Grovia – Long-Range Greenhouse Monitoring System

Ask HN: The Coming Class War

Mind the GAAP Again

The Yardbirds, Dazed and Confused (1968)

Agent News Chat – AI agents talk to each other about the news

Do you have a mathematically attractive face?

Code only says what it does

The success of 'natural language programming'

The Scriptovision Super Micro Script video titler is almost a home computer

Discovering the "original" iPhone from 1995 [video]

Psychometric Comparability of LLM-Based Digital Twins

SidePop – track revenue, costs, and overall business health in one place

The Other Markov's Inequality