frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

PDF Oxide – Fast PDF library in Rust with Python bindings – 0.8ms,100% pass rate

https://oxide.fyi/
1•yfedoseev•1h ago

Comments

yfedoseev•1h ago
PDF Oxide is a PDF text extraction and manipulation library written in Rust with Python bindings (via PyO3). It is MIT licensed.

I started building it when I needed fast, reliable PDF extraction for a data pipeline and couldn't find a permissively-licensed option that was both fast and handled edge cases well. PyMuPDF is fast but AGPL. pypdf is MIT but 15x slower. pdfplumber is great for tables but too slow for batch processing.

Technical Architecture:

Zero-Dependency Parser: Built from scratch in Rust using nom combinators (no MuPDF, no Poppler).

Layout Analysis: Uses XY-Cut projection partitioning for multi-column layout detection.

Robust Font Decoding: Implements a multi-level encoding fallback chain (ToUnicode CMap -> Encoding differences -> Base encoding -> CIDFont CMap -> Adobe Glyph List -> Identity). This is where most libraries produce garbage on CJK documents.

Benchmarks (Mean / p99 / Pass Rate) on 3,830 PDFs: pdf_oxide: 0.8ms / 9ms / 100% (MIT) PyMuPDF: 4.6ms / 28ms / 99.3% (AGPL-3.0) pypdfium2: 4.1ms / 42ms / 99.2% (Apache/BSD) pypdf: 12.1ms / 97ms / 98.4% (BSD)

Performance was profiling-driven. A bulk page tree cache turned a 10,000-page PDF from 55s to 332ms (O(n^2) to O(1)).

Quick Start (Python):

from pdf_oxide import PdfDocument doc = PdfDocument("document.pdf") for i in range(doc.page_count()): print(doc.extract_text(i))

Quick Start (Rust):

use pdf_oxide::PdfDocument; let mut doc = PdfDocument::open("document.pdf")?; for i in 0..doc.page_count()? { println!("{}", doc.extract_text(i)?); }

Capabilities: Text/Markdown/Image extraction, PDF creation from Markdown/HTML, form filling, and OCR (PaddleOCR via ONNX Runtime).

GitHub: https://github.com/yfedoseev/pdf_oxide Docs: https://oxide.fyi

I would love to hear what you think—especially if you throw it at PDFs that other libraries struggle with. The best way to improve is finding edge cases in the wild.

80386 Protection

https://nand2mario.github.io/posts/2026/80386_protection/
1•nand2mario•23s ago•0 comments

Release v2.0.0 · Charmbracelet/Bubbletea

https://github.com/charmbracelet/bubbletea/releases/tag/v2.0.0
1•pekim•37s ago•1 comments

Tomorrow's Social Networks

https://tanikella.ghost.io/untitled-2/
1•thenonjay•1m ago•0 comments

AI Data Centers Turn to High-Temperature Superconductors

https://spectrum.ieee.org/ai-data-centers-hts-superconductors
1•Brajeshwar•2m ago•0 comments

The US Had a Big Battery Boom Last Year

https://www.wired.com/story/the-us-had-a-big-battery-boom-last-year/
1•Brajeshwar•2m ago•0 comments

A risky maneuver could send a spacecraft to interstellar comet 3I/ATLAS

https://www.space.com/astronomy/comets/a-risky-maneuver-could-send-a-spacecraft-to-interstellar-c...
1•Brajeshwar•3m ago•0 comments

Show HN: Forecasts as GIFs (Free Lifetime Access)

https://apps.apple.com/us/app/brzzy-weather-radar-alerts/id6670187343
2•Brzzy•3m ago•0 comments

Show HN: Noodles – Turn any codebase into a diagram with Claude and Tree-sitter

https://github.com/unslop-xyz/noodles
3•unslop•3m ago•0 comments

Show HN: I just released v7 Javalin, a JVM web framework

https://javalin.io/news/javalin-7.0.0-stable.html
1•tipsee•4m ago•1 comments

Real-time settlement reshapes everyday financial support across borders

1•agentcoder•5m ago•0 comments

America's spymasters terrified Tim Cook with Taiwan invasion timeline

https://appleinsider.com/articles/26/02/24/americas-spymasters-terrified-tim-cook-with-taiwan-inv...
1•everybodyknows•6m ago•0 comments

Show HN: Neuron – Independent Rust crates for building AI agents

https://secbear.github.io/neuron/
1•secbear•6m ago•0 comments

Stop Parallelizing Your Agents

https://thedailydeveloper.substack.com/p/stop-parallelizing-your-ai-agents
1•endlessvoid94•7m ago•0 comments

Waymo Opens 4 New Cities to Public Riders (Now at 10 Total)

https://techcrunch.com/2026/02/24/waymo-robotaxis-are-now-operating-in-10-us-cities/
4•NullHypothesist•8m ago•1 comments

Teens Use and View AI

https://www.pewresearch.org/internet/2026/02/24/how-teens-use-and-view-ai/
1•swolpers•9m ago•0 comments

Lamborghini cancels electric Lanzador as supercar buyers reject EVs

https://arstechnica.com/cars/2026/02/lamborghini-drops-ev-plan-in-favor-of-future-plug-in-hybrids/
1•voxadam•10m ago•0 comments

Show HN: Tacit – The missing Layer 3 of the AI agent stack (open source)

https://github.com/tacitprotocol/tacit
1•ms170888•10m ago•0 comments

Dental group offers to fix Olympic Jack Hughes' smile for free

https://fox56.com/news/local/nepa-dental-group-offers-to-fix-jack-hughes-smile-after-toothless-gr...
1•DivingForGold•11m ago•1 comments

AI's Math Tricks Don't Work for Scientific Computing

https://spectrum.ieee.org/number-formats-ai-scientific-computing
1•rjmunro•12m ago•0 comments

Show HN: TTSLab – Text-to-speech that runs in the browser via WebGPU

https://ttslab.dev
1•MbBrainz•12m ago•0 comments

AIProx: An open registry and manifest standard for autonomous agent discovery

1•LightProx•12m ago•0 comments

Anthropic Links AI Agent with Tools for Investment Banking, HR

https://www.bloomberg.com/news/articles/2026-02-24/anthropic-links-ai-agent-with-tools-for-invest...
1•swolpers•15m ago•0 comments

OpenAI safety reps called to Ottawa after Tumbler Ridge, B.C., mass shooting

https://www.cbc.ca/news/politics/open-ai-summoned-ottawa-tumbler-ridge-9.7103281
3•ChrisArchitect•17m ago•1 comments

Show HN: A minimal coding agent in Elixir (Erlang/OTP)

https://github.com/matteing/opal
1•sergiomattei•17m ago•0 comments

Near-Instantly Aborting the Worst Pain Imaginable with Psychedelics

https://psychotechnology.substack.com/p/near-instantly-aborting-the-worst
1•surprisetalk•18m ago•0 comments

Change your default date format to the least ambiguous

https://practicalbetterments.com/change-your-default-date-format-to-the-least-ambiguous/
1•surprisetalk•18m ago•1 comments

Georgist land taxes balance community benefit and the efficiency of markets (2024)

https://devon.postach.io/post/georgist-land-taxes-balance-community-benefit-the-efficiency-of-mar...
2•surprisetalk•18m ago•0 comments

Pecking Order and Flight Leadership (2019)

https://srconstantin.wordpress.com/2019/04/29/pecking-order-and-flight-leadership/
1•surprisetalk•18m ago•0 comments

Apple's Multibillion-Dollar Push to Make Chips in the U.S. [video]

https://www.youtube.com/watch?v=ktFlaBhpMu8
2•tambourine_man•19m ago•0 comments

A catecholamine-independent pathway controlling adaptive adipocyte lipolysis

https://www.nature.com/articles/s42255-025-01424-5
1•PaulHoule•20m ago•0 comments