frontpage.
newsnewestaskshowjobs

Open Source @Github

fp.

Open in hackernews

Show HN: Papermill Press – An AI-friendly markup language for PDF generation

10•davidpapermill•1h ago
If you’ve generated PDFs from HTML, you’ll know the pain: headless Chrome in Docker, CSS hacks, content that flows over pages or table boundaries and other quality issues.

The fundamental problem is that HTML was designed for screens, not print.

We built Press, a markup-based document language where pages, content flows, and assets are first-class concepts. Content can flow across frames, columns, and pages without manual pagination. Pages are created dynamically based on the available content.

Press templates separate layout from content. You can send markdown, Press markup, or a mixture of both to the API. Data can be sent in JSON, CSV, XML.

Because Press is XML-based it can easily be generated by agents - some of our users are generating complete documents in a single shot, although the language is designed for repeatable automation.

You can also use our MCP server, which enables models to design templates.

A simple API call sends a markdown payload, which is injected into a <flows><body>…</body></flows> element in Press:

  curl -X POST https://api.papermill.io/v2/pdf?template_id=papermill-modern-report \
    -H "Authorization: Bearer $PAPERMILL_API_KEY" \
    -H "Content-Type: text/markdown" \
    --data-binary @- \
    -o report.pdf <<'EOF'
  # Q3 Revenue Summary
  
  Quarterly performance across our core product lines.
  
  | Product  | Revenue  | Growth |
  |----------|----------|--------|
  | Platform | £482,000 | +18%   |
  | Add-ons  | £124,000 | +42%   |
  | Services | £67,000  | -3%    |
  
  Strong quarter overall, driven by add-on adoption.
  EOF
PDF output: https://mill.pm/hn

Pages and frames in Press can declare dependencies on a content flow like the <body> above. By default, if there’s no content in the flow then the frame or page won’t be generated. You can run flows between frames and pages, and combine multiple flows on a page - for example, a sidebar can run across pages until no content is left, then make room for body content. This makes it possible to implement complex layouts.

You can mix markdown and Press:

  # Visualisation

  Sometimes it's *useful* to mix both markdown and Press:

  <visualization>...</visualization>
The typesetter adapts to dynamic content (e.g. LLM output). For example, tables and columns can be automatically sized and Papermill will even auto-rotate a table and its page to fit if needed.

Templates support components, repeating over data, document logic, and conditional styling. We mostly use an inline-styling approach, and provide the concept of a style “alias”, which is a bag of styling properties you can reuse and compose.

Here’s an example template written in Press, our document language. It uses the first page layout until the sidebar flow is exhausted, then switches to the second:

  <press>
    <document format="A4" page-margin="2cm">
      <repeat flow="sidebar">
        <page>
          <frame direction="row">
            <frame padding-left="1cm" padding-right="1cm"><flow name="body" /></frame>
            <frame width="20%" background-color="#f5f5f5" padding="0.5cm" font-size="9pt"><flow name="sidebar" /></frame>
          </frame>
        </page>
      </repeat>
      <repeat flow="body">
        <page><flow name="body" width="fill" /></page>
      </repeat>
    </document>
    <flows>
      <body type="markdown"><lipsum paragraphs="10" /></body>
      <sidebar type="markdown"><lipsum paragraphs="3" /></sidebar>
    </flows>
  </press>
Papermill is a paid API with a free tier. Press is the document language.

Try it for free: https://app.papermill.io/signup (no credit card needed)

Docs including MCP setup: https://docs.papermill.io

Data-only sandbox: https://app.papermill.io/demo.html (no email needed)

We're a small team based in Manchester, UK. Tom (CTO) and I are happy to answer questions about the language design, the rendering engine, or anything else!

Comments

tompapermill•1h ago
Hi, Tom here. I'm happy to answer any questions r.e., generating PDFs, better ways of making automated document workflows more robust, or anything related to Papermill and Press!
tim_at_ping•1h ago
V cool. Glad to see another Manchester based start up!
davidpapermill•1h ago
Thanks Tim! Great to see Byteful doing so well too!
Ste_CreaTech•1h ago
Byteful seems cool! WEb proxy stuff is important for me in my near future so I may just check you out. Thanks for commenting!
davidpapermill•1h ago
Hi Folks - I'm David, CEO of Papermill. We're super excited to get feedback on what we're building. This is the first time we've posted to a forum. Designing a document language for AI is a major undertaking and we'd love to hear from folks with experience and interest in document languages and generation.
Ste_CreaTech•1h ago
I'm excited to see more about how Papermill and Press can interact with agents in agentic workflows! This seems like a great problem solved in AI-first startups.
tompapermill•1h ago
Thanks Ste, we're happy to answer any questions you've got!
davidpapermill•49m ago
Press is designed from the ground-up for agentic workflows. We spoke to hundreds of companies generating PDFs before we designed the language, and had been building AI design systems ourselves for a few years.

You can connect an agent to our MCP server, see: https://docs.papermill.io/mcp/#using-the-papermill-mcp-serve...

It's quite amazing to see what Claude can do when it has a tool like Papermill at its disposal - worth connecting and having a play.

In practice, you can either directly connect an LLM to the MCP server, or send via the API after (say) cleaning up LLM output, combining it with RAG and other data etc.

tomfitzsimmons•47m ago
I'd love to see something like this in chatgpt or claude. The PDFs you can generate there are always so boring.
davidpapermill•44m ago
Thanks Tom, good to see you on HN! You can connect Papermill to your favourite agent via MCP - we prefer Claude Code.

https://docs.papermill.io/mcp/#using-the-papermill-mcp-serve...

You can ask Claude to generate templates, full documents, ask it about the Press language, save templates, ask it about templates and more. There's a lot more to it - design guidelines for print vs web, recipes in Press etc.

I've tried it with Fable already and it's a noticeable improvement - we support visual feedback through MCP and I think that's helping Fable a lot.

tompapermill•40m ago
Hi Tom,

One of the things we've learned is that models often need more guidance when generating documents. We give real-time feedback to models as they're constructing the document, such as "you have content here that you've done nothing with and doesn't appear in the report as it likely overflowed off the page, you should think of a way to handle this" and "this image is sized incorrectly and just will not fit in the area you've specified".

Giving models the tools to create media for print allows both the user and the model to experiment with more wild designs, knowing that things will not randomly break!

frantzalot•45m ago
As a solicitor myself managing a high street law firm that is already comfortable drafting, reviewing, and producing client-ready PDFs from Word, where is the return on investment in adopting Press? What can Press do that cannot be achieved with Word and existing document automation tools?”
davidpapermill•42m ago
Papermill is more the "infrastructure" layer that sits below legal AI tools - we don't work directly with law firms, rather more likely to be used by a startup building a legal tool. Companies like Wordsmith, Legatics, Legora, Harvey can use Papermill to output high-quality PDFs.

That said, an export from Word to Papermill templates might be a nice addition...

lmartinneuwave•32m ago
looks great dave! loved the product. Hope it all goes well.
off_by_two•23m ago
Looks interesting. Are there any limits on the file size or length of pages that can be produced?
tompapermill•18m ago
Hi off_by_two

We don't apply hard limits on PDFs. We've handled PDFs of over 1GB and close to 1000 pages.

We also have a large document service that handles extremely large PDFs in the background.

davidpapermill•9m ago
Worth adding that we also support the integration of many PDFs into a large document, including complex processing, but this is not part of the publicly accessible Press language (yet).

Testing MiniMax M3 on refactoring, screenshot debugging, music recommendations

https://andlukyane.com/blog/minimax-m3
1•Artgor•41s ago•0 comments

Anthropogenically Induced Geophagy in Gibraltar Barbary Macaques

https://www.nature.com/articles/s41598-026-44607-0
1•PaulHoule•1m ago•0 comments

Killed by GPT

https://killedbygpt.com/
1•thoughtpeddler•1m ago•0 comments

Development and Tool Resources

https://buildsoftwaresystems.com/post/rust-coding-conventions-learning-resources/
1•mahirsaid•1m ago•0 comments

Show HN: Local Context and Memory Stack

https://github.com/supermemoryai/supermemory
1•dhravya•2m ago•0 comments

AI Voice Agent Architecture: How Real-Time Conversational Systems Work

https://www.faridfadaie.com/2026/06/10/ai-voice-agent-architecture/
1•ffadaie•2m ago•0 comments

GoTailo

1•Juancabrera123•2m ago•0 comments

SpaceX's $1.78T IPO asks investors to buy Musk's moonshots

https://www.ft.com/content/70fa49e3-1014-4412-890f-c7fe91497db9
1•aanet•2m ago•0 comments

The state of building user interfaces in Rust

https://areweguiyet.com/#ecosystem
1•mahirsaid•4m ago•0 comments

Visit every country on Earth by making a video call with automatic translation

https://philiprosedale.substack.com/p/voyager
1•SLHamlet•6m ago•0 comments

The Cost of Implementing NIS2

https://nisd2.eu/en/wiki/implementation/cost
1•cjhisey•6m ago•0 comments

Cybersecurity researchers aren't happy about the guardrails on Anthropic's Fable

https://techcrunch.com/2026/06/10/cybersecurity-researchers-arent-happy-about-the-guardrails-on-a...
1•speckx•6m ago•0 comments

I Was Wrong About Scrum, Again

https://rethinkingsoftware.substack.com/p/i-was-wrong-about-scrum-again
1•aard•7m ago•0 comments

Mixed-use districts don't reverse the dismal economics of sports venues

https://theconversation.com/sorry-tampa-bay-mixed-use-districts-dont-reverse-the-dismal-economics...
1•PaulHoule•7m ago•0 comments

How Does Our Taste in Movies Change with Age?

https://www.statsignificant.com/p/how-does-our-taste-in-movies-change
1•thm•8m ago•0 comments

We found a $60 Hetzner VM competing with AWS and Google VMs over $500/mo

https://webbynode.com/articles/a-60-hetzner-vm-is-challenging-aws-and-google-cloud-instances-cost...
2•gsgreen•8m ago•1 comments

Breaking news, and how the end might begin

https://garymarcus.substack.com/p/breaking-news-and-how-the-end-might
1•petethomas•8m ago•0 comments

The Proof in the Code

https://www.quantabooks.org/books/the-proof-in-the-code/
1•Tomte•9m ago•0 comments

The Tao of Datastar

https://data-star.dev/guide/the_tao_of_datastar
1•andersmurphy•10m ago•0 comments

Chavda's Paradox

https://zencapital.substack.com/p/chavdas-paradox
4•zenincognito•11m ago•1 comments

A Store for GitHub Releases

https://github-store.org/
1•linsomniac•12m ago•0 comments

Gaslighting Openness

https://lucumr.pocoo.org/2026/6/10/gaslighting/
1•Tomte•12m ago•0 comments

Applejak

https://internet-janitor.itch.io/applejak
1•tosh•13m ago•0 comments

What a Regex Can't Do: A Bayesian Governor for OpenClaw's Tool Calls

https://gfrm.in/posts/credence-pi-pass-2/
2•slygent•13m ago•0 comments

Language models manipulating their own internal states

https://www.lesswrong.com/posts/cNDJuXNZ8MrkPZNzj/machinic-psychopharmacology-do-llms-self-medica...
2•afpx•14m ago•0 comments

Show HN: Private Wealth Tracker

https://apps.apple.com/us/app/getzoro/id6767001446
2•mazinz•14m ago•0 comments

Tweaking GPU Clock Frequency Cuts LLM Training Energy

https://spectrum.ieee.org/llm-training-energy-saving-trick
2•rbanffy•14m ago•0 comments

Improving the carbon footprint assessment of milk production

https://link.springer.com/article/10.1007/s11367-026-02579-3
2•PaulHoule•15m ago•0 comments

The Archivist in Me Turned This Blog into a Book

https://brainbaking.com/post/2026/06/the-archivist-in-me-turned-this-blog-into-a-book/
2•speckx•15m ago•0 comments

HN: AInfra – A native C virtual machine for AI infrastructure graphs

https://github.com/TangibleResearch/AInfra
2•reboy•16m ago•1 comments