Zpdf: PDF text extraction in Zig

217•lulzx•1mo ago

Comments

lulzx•1mo ago

I built a PDF text extraction library in Zig that's significantly faster than MuPDF for text extraction workloads.

~41K pages/sec peak throughput.

Key choices: memory-mapped I/O, SIMD string search, parallel page extraction, streaming output. Handles CID fonts, incremental updates, all common compression filters.

~5,000 lines, no dependencies, compiles in <2s.

Why it's fast:

  - Memory-mapped file I/O (no read syscalls)
  - Zero-copy parsing where possible
  - SIMD-accelerated string search for finding PDF structures
  - Parallel extraction across pages using Zig's thread pool
  - Streaming output (no intermediate allocations for extracted text)

What it handles:

  - XRef tables and streams (PDF 1.5+)
  - Incremental PDF updates (/Prev chain)
  - FlateDecode, ASCII85, LZW, RunLength decompression
  - Font encodings: WinAnsi, MacRoman, ToUnicode CMap
  - CID fonts (Type0, Identity-H/V, UTF-16BE with surrogate pairs)

tveita•1mo ago

What kind of performance are you seeing with/without SIMD enabled?

From https://github.com/Lulzx/zpdf/blob/main/src/main.zig it looks like the help text cites an unimplemented "-j" option to enable multiple threads.

There is a "--parallel" option, but that is only implemented for the "bench" command.

lulzx•1mo ago

I have now made parallel by default and added an option to enable multiple threads.

I haven't tested without SIMD.

cheshire_cat•1mo ago

You've released quite a few projects lately, very impressive.

Are you using LLMs for parts of the coding?

What's your work flow when approaching a new project like this?

littlestymaar•1mo ago

> Are you using LLMs for parts of the coding?

I can't talk about the code, but the readme and commit messages are most likely LLM-generated.

And when you take into account that the first commit happened just three hours ago, it feels like the entire project has been vibe coded.

Neywiny•1mo ago

Hard disagree. Initial commit was 6k LOC. Author could've spent years before committing. Ill advised but not impossible.

littlestymaar•1mo ago

Why would you make Claude write your commit message for a commit you've spent years working on though?

Neywiny•1mo ago

1. Be not good at or a fan of git when coding

2. Be not good at or a fan of git when committing

Not sure what the disconnect is.

Now if it were vibecoded, I wouldn't be surprised. But benefit of the doubt

Jach•1mo ago

We're well beyond benefit of the doubt these days. If it looks like a duck... For me there wasn't any doubt, the author's first top comment here was evidence enough, then seeing the readme + random code + random commit message, it's all obvious LLM-speak to me.

I don't particularly care, though, and I'm more positive about LLMs than negative even if I don't (yet?) use them very much. I think it's hilarious that a few people asked for Python bindings and then bam, done, and one person is like "..wha?" Yes, LLMs can do that sort of grunt work now! How cool, if kind of pointless. Couldn't the cycles have just been spent on trying to make muPDF better? Though I see they're in C and AGPL, I suppose either is motivation enough to do a rewrite instead. (This is MIT Licensed though it's still unclear to me how 100% or even large-% vibe-coded code deserves any copyright protection, I think all such should generally be under the Unlicense/public domain.)

If the intent of "benefit of the doubt" is to reduce people having a freak out over anyone who dares use these tools, I get that.

lulzx•1mo ago

I have updated the licence to WTFPL.

I'll try my best to make it a really good one!

littlestymaar•1mo ago

> I have updated the licence to WTFPL.

You still have no basis in claiming copyright protection hence you cannot set a license on that code.

Instead of the WTFPL you should just write a disclaimer that due to being machine generated and devoid of creating work, the work is not protected by copyright and free to be used without any license.

lulzx•1mo ago

hasn't world moved on from these things already?

grayhatter•1mo ago

Has the world moved on from copyright? Or expecting other people to behave ethically and fairly?

No, and god I hope not.

But it's a real dick move to set up your CI the way you have. Zig explicitly requests using one of the many mirrors for CI instead of hammering the main ziglang.org site itself. Perhaps you've moved on from trying to be ethical?

lulzx•1mo ago

That's good to know, I wasn't aware of it, I have updated to using a github action they recommend (https://github.com/marketplace/actions/setup-zig-compiler)

For the copyright thing, I understand that there's legit ongoing debate around all this AI-assisted coding and copyrightability.

In this case of zpdf, while Claude Code did a lot of the heavy lifting on implementation, there was a real effort in architecture decisions, iterative prompting/refinement, debugging, testing, benchmarking.

My intent is zero restrictions: use it, fork it, sell it, whatever. WTFPL captures that spirit perfectly for me. It's as permissive as legally possible while being upfront about not caring.

The goal is just to make a useful tool freely available.

Edit: I have changed it to CC0.

lulzx•1mo ago

Claude Code.

jeffbee•1mo ago

What's fast about mmap?

rishabhaiover•1mo ago

it allows the program to reference memory without having to manage it in the heap space. it would make the program faster in a memory managed language, otherwise it would reduce the memory footprint consumed by the program.

jeffbee•1mo ago

You mean it converts an expression like `buf[i]` into a baroque sequence of CPU exception paths, potentially involving a trap back into the kernel.

rishabhaiover•1mo ago

I don't fully understand the under the hood mechanics of mmap, but I can sense that you're trying to convey that mmap shouldn't be used a blanket optimization technique as there are tradeoffs in terms of page fault overheads (being at the mercy of OS page cache mechanics)

jibal•1mo ago

I think he's conveying that he doesn't know what he's talking about. buf[i] generates the same code regardless of whether mmap is being used. The first access to a page will cause a trap that loads the page into memory, but this is also true if the memory is read into.

StilesCrisis•1mo ago

Tradeoffs such as "if an I/O error occurs, the program immediately segfaults." Also, I doubt you're I/O bound to the point where mmap noticeably better than read, but I guess it's fine for an experiment.

jibal•1mo ago

An I/O error on a mmapped file causes a SIGBUS, which the program can catch and report.

And I/O bound programs are I/O bound whereas programs that aren't, aren't, so it really isn't meaningful to talk about whether "you" are I/O bound to the point that it's significant--maybe you are, maybe you aren't. I agree about experimentation.

kennethallen•1mo ago

Two big advantages:

You avoid an unnecessary copy. Normal read system call gets the data from disk hardware into the kernel page cache and then copies it into the buffer you provide in your process memory. With mmap, the page cache is mapped directly into your process memory, no copy.

All running processes share the mapped copy of the file.

There are a lot of downsides to mmap: you lose explicit error handling and fine-grained control of when exactly I/O happens. Consult the classic article on why sophisticated systems like DBMSs do not use mmap: https://db.cs.cmu.edu/mmap-cidr2022/

saidinesh5•1mo ago

This is a very interesting link. I didn't expect mmap to be less performant than read() calls.

I now wonder which use cases would mmap suit better - if any...

> All running processes share the mapped copy of the file.

So something like building linkers that deal with read only shared libraries "plugins" etc ..?

squirrellous•1mo ago

One reason to use shared memory mmap is to ensure that even if your process crashes, the memory stays intact. Another is to communicate between different processes.

kennethallen•1mo ago

mmap is better when:

  * You want your program to crash on any I/O error because you wouldn't handle them anyway
  * You value the programming convenience of being able to treat a file on disk as if the entire thing exists in memory
  * The performance is good enough for your use. As the article showed, sequential scan performance is as good as direct I/O until the page cache fills up *from a single SSD*, and random access performance is as good as direct I/O until the page cache fills up *if you use MADV_RANDOM*. If your data doesn't fit in memory, or is across multiple storage devices, or you don't correctly advise the OS about your access patterns, mmap will probably be much slower

To be clear, normal I/O still benefits from the OS's shared page cache, where files that other processes have loaded will probably still be in memory, avoiding waiting on the storage device. But each normal I/O process incurs the space and time cost of a copy into its private memory, unlike mmap.

commandersaki•1mo ago

you lose explicit error handling

I've never had to use mmap but this is always been the issue in my head. If you're treating I/O as memory pages, what happens when you read a page and it needs to "fault" by reading the backing storage but the storage fails to deliver? What can be said at that point, or does the program crash?

kennethallen•1mo ago

If you fail to load an mmapped page because of an I/O error, Unix-like OSes interrupt your program with SIGBUS/SIGSEGV. It might be technically possible to write a program that would handle those signals and recover, but it seems like a lot more work and complexity than just checking errno after a read system call.

nextaccountic•1mo ago

> Consult the classic article on why sophisticated systems like DBMSs do not use mmap: https://db.cs.cmu.edu/mmap-cidr2022/

Sqlite does (or can optionally use mmap). How come?

Is sqlite with mmap less reliable or anything?

jeffbee•1mo ago

I know that the spirit of HN will strike me down for this, but sqlite is not a "sophisticated system". It assumes the hardware is lawful neutral. Real hardware is chaotic. Sqlite has a good reputation because it is very easy to use. In fact this is the same reason programmers like mmap: it is a hell of a shortcut.

nextaccountic•1mo ago

I think the main thing is whether mmap will make sqlite lose data or otherwise corrupt already committed data

... it will if two programs open the same sqlite, one with mmap, and another without https://www.sqlite.org/mmap.html - at least "in some operating systems" (no mention of which ones)

https://www.sqlite.org/mmap.html

> The operating system must have a unified buffer cache in order for the memory-mapped I/O extension to work correctly, especially in situations where two processes are accessing the same database file and one process is using memory-mapped I/O while the other is not. Not all operating systems have a unified buffer cache. In some operating systems that claim to have a unified buffer cache, the implementation is buggy and can lead to corrupt databases.

Sqlite is otherwise rock solid and won't lose data as easily

SQLite•1mo ago

If an I/O error happens with read()/write(), you get back an error code, which SQLite can deal with and pass back up to the application, perhaps accompanied by a reasonable error message. But if you get an I/O error with mmap, you get a signal. SQLite itself ought not be setting signal handlers, as that is the domain of the application and SQLite is just a lowly library. And even if SQLite could set signal handlers, it would be difficult to associate a signal with a particular I/O operation. So there isn't a good way to deal with I/O errors when using mmap(). With mmap(), you just have to assume that the filesystem/mass-storage works flawlessly and never runs out of space.

SQLite can use mmap(). That is a tested and supported capability. But we don't advocate it because of the inability to precisely identify I/O errors and report them back up into the application.

nextaccountic•1mo ago

Thanks for the response. I am more worried about losing already committed data due to an error

https://www.sqlite.org/mmap.html

What are those OSes with buggy unified buffer caches? More importantly, is there a list of platforms where the use of mmap in sqlite can lead to data loss?

jonstewart•1mo ago

What’s the fidelity like compared to tika?

lulzx•1mo ago

The accuracy difference is marginal (1-2%) but the speed difference is massive.

DannyBee•1mo ago

FWIW - mupdf is simply not fast. I've done lots of pdf indexing apps, and mupdf is by far the slowest and least able to open valid pdfs when it came to text extraction. It also takes tons of memory.

a better speed comparison would either be multi-process pdfium (since pdfium was forked from foxit before multi-thread support, you can't thread it), multi-threaded foxit, or something like syncfusion (which is quite fast and supports multiple threads). Or even single thread pdfium vs single thread your-code.

These were always the fastest/best options. I can (and do) achieve 41k pages/sec or better on these options.

The other thing it doesn't appear you mention is whether you handle putting the words in reading order (IE how they appear on the page), or only stream order (which varies in its relation to apperance order) .

If it's only stream order, sure, that's really fast to do. But also not anywhere near as helpful as reading order, which is what other text-extraction engines do.

Looking at the code, it looks like the code to do reading order exists, but is not what is being benchmarked or used by default?

If so, this is really comparing apples and oranges.

littlestymaar•1mo ago

> I built

You didn't. Claude did. Like it did write this comment.

And you didn't even bother testing it before submitting, which is insulting to everyone.

lulzx•1mo ago

tools are tools.

agentifysh•1mo ago

excellent stuff what makes zig so fast

observationist•1mo ago

Not being slow - they compile straight to bytecode, they aren't interpreted, and have aggressive, opinionated optimizations baked in by default, so it's even faster than compiled c (under default conditions.)

Contrasted with python, which is interpreted, has a clunky runtime, minimal optimizations, and all sorts of choices that result in slow, redundant, and also slow, performance.

The price for performance is safety checks, redundancy, how badly wrong things can go, and so on.

A good compromise is luajit - you get some of the same aggressive optimizations, but in an interpreted language, with better-than-c performance but interpreted language convenience, access to low level things that can explode just as spectacularly as with zig or c, but also a beautiful language.

agentifysh•1mo ago

will add this to the list, now learning new languages is less of a barrier with LLMs

Zambyte•1mo ago

Zig is safer than C under default conditions, not faster. By default does a lot of illegal behavior safety checking, such as array and slice bounds checking, numeric overflow checking, and invalid union access checking. These features are disabled by certain (non default) build modes, or explicitly disabled at a per scope level.

It may be easier to write code that runs faster in Zig than in C under similar build optimization levels, because writing high performance C code looks a lot like writing idiomatic Zig code. The Zig standard library offers a lot of structures like hash maps, SIMD primitives, and allocators with different performance characteristics to better fit a given use-case. C application code often skips on these things simply because it is a lot more friction to do in C than in Zig.

jibal•1mo ago

> they compile straight to bytecode

machine code, not https://en.wikipedia.org/wiki/Bytecode

> The price for performance is safety checks

In Zig, non-ReleaseFast build modes have significant safety checks.

> luajit ... with better-than-c performance

No.

AndyKelley•1mo ago

It makes your development workflow smooth enough that you have the time and energy to do stuff like all the bullet points listed in https://news.ycombinator.com/item?id=46437289

forgotpwd16•1mo ago

>you have the time and energy to do stuff like all the bullet points listed

Don't disagree but in specific case, per the author, project was made via Claude Code. Although could as well be that Zig is better as LLM target. Noticed many new vibe projects decide to use Zig as target.

mpeg•1mo ago

very nice, it'd be good to see a feature comparison as when I use mupdf it's not really just about speed, but about the level of support of all kinds of obscure pdf features, and good level of accuracy of the built-in algorithms for things like handling two-column pages, identifying paragraphs, etc.

the licensing is a huge blocker for using mupdf in non-OSS tools, so it's very nice to see this is MIT

python bindings would be good too

lulzx•1mo ago

added a comparison, will improve further. https://github.com/Lulzx/zpdf?tab=readme-ov-file#comparison-...

also, added python bindings.

mpeg•1mo ago

thanks, claude, I guess haha

as others have commented, I think while this is a nice portfolio piece, I would worry about its longevity as a vibe coded project

chanbam•1mo ago

If he made something legitimately useful, who cares how?

littlestymaar•1mo ago

It seems that he didn't even test it before submitting though…

The author has created 30 new projects on github, in half a dozen different programming language, over the past month alone, and he also happen to have an LLM-generated blog. I think it's fair to say it's not “legitimately useful” except as a way for the author to fill his resume as he's looking for a job.

This kind of behavior is toxic.

mpeg•1mo ago

Exactly this, I like to give the benefit of the doubt to people but pushing huge chunks of code this quickly shows the whole thing is vibe coded

I actually don’t mind LLM generated code when it’s been manually reviewed, but this and a quick look through other submissions makes me realise the author is simply trying to pad their resume with OSS projects. Respect the hustle, but it shows a lack of respect for other’s time to then submit it to show HN

lulzx•1mo ago

Fair point. I won't submit here again until I've put in the work to make something that respects people's time to evaluate it. Lesson learned. :)

odie5533•1mo ago

Now we just need Python bindings so I can use it in my trash language of choice.

lulzx•1mo ago

added python bindings!

hiq•1mo ago

Were you working on it already, or did it take you less than 17 minutes to commit https://github.com/Lulzx/zpdf/commit/9f5a7b70eb4b53672c0e4d8... ?

qeternity•1mo ago

Claude Code.

littlestymaar•1mo ago

+ not testing the output.

littlestymaar•1mo ago

- First commit 3hours ago.

- commit message: LLM-generated.

- README: LLM-generated.

I'm not convinced that projects vibe coded over the evening deserve the HN front page…

Edit: and of course the author's blog is also full of AI slop…

2026 hasn't even started I already hate it.

kingkongjaffa•1mo ago

Wait, but why?

If it's really better than what we had before, what does it matter how it was made? It's literally hacked together with the tools of the day (LLMs) isn't that the very hacker ethos? Patching stuff together that works in a new and useful way.

5x speed improvements on pdf text extraction might be great for some applications I'm not aware of, I wouldn't just dismiss it out of hand because the author used $robot to write the code.

Presumably the thought to make the thing in the first place and decide what features to add and not add was more important than how the code is generated?

utopiah•1mo ago

> If it's really better than what we had before

That's a very big if. The whole point is that what we had before was made slowly. This was made quickly. In itself it's not better but what it typically means is hours and hours of testing. Going through painful problems that highlight idiosyncrasies of the problem space. Things that are really weird and specific to whatever the tool is trying to address.

In such cases we can be expect that with very little time very few things were tested and tested properly (including a comment mentioned how tests were also generated). "We" the audience of potentially interested users have then to do that work (as plenty did commenting on that post).

IMHO what you bring forward is precisely that :

- can the new "solution" actually pass ALL the tests the previous one did? More?

This should be brought to the top and the actual compromises can then be understood, "we" can then decide if it's "better" for our context. In some cases faster with lossy output is actually better, in others absolutely not. The difference between the new and the old solutions isn't binary and have no visibility on that is what makes such a process nothing more than yet another showcase that LLMs can indeed produce "something" that is absolutely boring while consuming a TON of resources, including our own attention.

TL;DR: there should be test "harness" made by 3rd parties (or from well known software it is the closest too) that an LLM generated piece of code should pass before being actually compared.

utopiah•1mo ago

dmytrish•1mo ago

...and it does not work. I tried it on ~10 random pdfs, including very simple ones (e.g. a hello world from typst), it segfaults on every single one.

forgotpwd16•1mo ago

Tried few and works. Maybe you've older or newer Zig version than whatever project targets. (Mine is 0.15.2.)

dmytrish•1mo ago

   ~/c/t/s/zpdf (main)> zig version
   0.15.2

Sky is blue, water is wet, slop does not work.

ncgl•1mo ago

Using Ai isn't lazier than your regurgitated dismissal, to be fair.

littlestymaar•1mo ago

Using AI is not necessarily lazy.

Using AI lazily is a problem though. Writing code has never been the most important part of software development, making sure that the code does what the user needs is what takes most of the time. But from the github issues and the comment here from the few who have tested the tool, it looka like the author didn't even test the AI output on real PDF.

If you use AI to build in 3 month something that would have taken a year without it, then cool. But here we're talking about someone who's spending 2-3 hours every other day building a new fake software project to pad his resume. This isn't something anyone should endorse.

forgotpwd16•1mo ago

  74910,74912c187768,187779
  < [Example 1: If you want to use the code conversion facetcodecvt_utf8to output tocouta UTF-8 multibyte sequence
  < corresponding to a wide string, but you don't want to alter the locale forcout, you can write something like:\237 D.27.21954
                                                                                                                                \251ISO/IECN4950wstring_convert<std::codecvt_utf8<wchar_t>> myconv;
  < std::string mbstring = myconv.to_bytes\050L"Hello\134n"\051;
  ---
  >
  > [Example 1: If you want to use the code conversion facet codecvt_utf8 to output to cout a UTF-8 multibyte sequence
  > corresponding to a wide string, but you don’t want to alter the locale for cout, you can write something like:
  >
  > § D.27.2
  > 1954
  >
  > © ISO/IEC
  > N4950
  >
  > wstring_convert<std::codecvt_utf8<wchar_t>> myconv;
  > std::string mbstring = myconv.to_bytes(L"Hello\n");

Is indeed faster but output is messier. And doesn't handle Unicode in contrast to mutool that does. (Probably also explains the big speed boost.)

lulzx•1mo ago

fixed.

TZubiri•1mo ago

Lol, but there's 100 competitors in the PDF text extraction space, some are multi million dollar industries: AWS textract, ABBY PDFreader, PDFBox, I think you may be underestimating the challenge here.

forgotpwd16•1mo ago

Yeah, sorry for confusion. When said Unicode, meant foreign text rather (just) the unescaped symbols, e.g. Greek. At one random Greek textbook[0], zpdf output is (extract | head -15):

  01F9020101FC020401F9020301FB02070205020800030209020701FF01F90203020901F9012D020A0201020101FF01FB01FE0208 
  0200012E0219021802160218013202120222 0209021D0212021D012E013202200222000301FA021A0220021C022002160213012E0222000F000301F90206012C

  020301FF02000205020101FC020901F90003020001F9020701F9020E020802000205020A 
  01FC028C0213021B022002230221021800030200012E021902180216021201320221021A012E00030209021D0212021D012E013202200222000301FA021A0220021C022002160213012E0222000F000301F90206012C 
 
  0200020D02030208020901F90203020901FF0203020502080003012B020001F9012B020001F901FA0205020A01FD01FE0208 
  020201300132012E012F021A012F0210021B013202200221012E0222 0209021D0212021D012E013202200222000301FA021A0220021C022002160213012E0222000F000301F90206012C

This for entire book. Mutool extracts the text just fine.

[0]: https://repository.kallipos.gr/handle/11419/15087

lulzx•1mo ago

sorry, I haven't yet figured out non-latin with tounicode references.

lulzx•1mo ago

works now!

ΑΛΕΞΑΝΔΡΟΣ ΤΡΙΑΝΤΑΦΥΛΛΙΔΗΣ Καθηγητής Τμήματος Βιολογίας, ΑΠΘ

     ΝΙΚΟΛΕΤΑ ΚΑΡΑΪΣΚΟΥ
     Επίκουρη Καθηγήτρια Τμήματος Βιολογίας, ΑΠΘ

     ΚΩΝΣΤΑΝΤΙΝΟΣ ΓΚΑΓΚΑΒΟΥΖΗΣ
     Μεταδιδάκτορας Τμήματος Βιολογίας, ΑΠΘ





     Γονιδιώματα
     Δομή, Λειτουργία και Εφαρμογές

forgotpwd16•1mo ago

Nice! Speed wasn't even compromised. Still 5x when benching. Also saw now there's page with tool compiled to wasm. Cool.

lulzx•1mo ago

thanks! :)

TZubiri•1mo ago

In my experience with parsing PDFs, speed has never been an issue, it has always been a matter of quality.

DetroitThrow•1mo ago

I tried a small PDF and got a memory error. It's definitely much faster than MuPDF on that file.

littlestymaar•1mo ago

“The fastest PDF extractor is the one that crashes at the beginning of the file” or something.

amkharg26•1mo ago

Impressive performance gains! 5x faster than MuPDF is significant, especially for applications processing large volumes of PDFs. Zig's memory safety without garbage collection overhead makes it ideal for this kind of performance-critical work.

I'm curious about the trade-offs mentioned in the comments regarding Unicode handling. For document analysis pipelines (like extracting text from technical documentation or research papers), robust Unicode support is often critical.

Would be interesting to see benchmarks on different PDF types - academic papers with equations, scanned documents with OCR layers, and complex layouts with tables. Performance can vary wildly depending on the document structure.

polyaniline•1mo ago

What memory safety?

Retr0id•1mo ago

(the comment was written by an llm bot)

nullorempty•1mo ago

Tomorrow's headlines

fpdf

jpdf

cpdf

cpppdf

bfpdf

ppdf

...

opdf

pm2222•1mo ago

What’s the format that’s perhaps free, easy to parse and render? Build one please.

fainpul•1mo ago

These vibe coded tests are terrible:

https://github.com/Lulzx/zpdf/blob/main/python/tests/test_zp...

lulzx•1mo ago

this is more like a quick test for python bindings, the zig files have tests within them for broad range of things.

xvilka•1mo ago

Test it on major PDF corpora[1]

[1] https://github.com/pdf-association/pdf-corpora

manmal•1mo ago

Is there the possibility to hook in OCR for text blocks flattened into an image, maybe with some callback? That’s my biggest gripe with dealing with PDFs.

ceving•1mo ago

The spacing issue isn't working quite right yet.

    zpdf extract texbook.pdf | grep -m1 Stanford
    DONALD E. KNUTHStanford UniversityIllustrations by

Effects of Zepbound on Stool Quality

Show HN: Seedance 2.0 – The Most Powerful AI Video Generator

Ask HN: Do we need "metadata in source code" syntax that LLMs will never delete?

Pentagon cutting ties w/ "woke" Harvard, ending military training & fellowships

Can Quantum-Mechanical Description of Physical Reality Be Considered Complete? [pdf]

Kessler Syndrome Has Started [video]

Complex Heterodynes Explained

EVs Are a Failed Experiment

MemAlign: Building Better LLM Judges from Human Feedback with Scalable Memory

CCC (Claude's C Compiler) on Compiler Explorer

Homeland Security Spying on Reddit Users

Actors with Tokio (2021)

Can graph neural networks for biology realistically run on edge devices?

Deeper into the shareing of one air conditioner for 2 rooms

Weatherman introduces fruit-based authentication system to combat deep fakes

Why Embedded Models Must Hallucinate: A Boundary Theory (RCC)

A Curated List of ML System Design Case Studies

Pony Alpha: New free 200K context model for coding, reasoning and roleplay

Show HN: Tunbot – Discord bot for temporary Cloudflare tunnels behind CGNAT

Open Problems in Mechanistic Interpretability

Bye Bye Humanity: The Potential AMOC Collapse

Dexter: Claude-Code-Style Agent for Financial Statements and Valuation

Digital Iris [video]

Essential CDN: The CDN that lets you do more than JavaScript

They Hijacked Our Tech [video]

Vouch

HRL Labs in Malibu laying off 1/3 of their workforce

Show HN: High-performance bidirectional list for React, React Native, and Vue

Show HN: I built a Mac screen recorder Recap.Studio

Ask HN: Codex 5.3 broke toolcalls? Opus 4.6 ignores instructions?

Effects of Zepbound on Stool Quality

Show HN: Seedance 2.0 – The Most Powerful AI Video Generator

Ask HN: Do we need "metadata in source code" syntax that LLMs will never delete?

Pentagon cutting ties w/ "woke" Harvard, ending military training & fellowships

Can Quantum-Mechanical Description of Physical Reality Be Considered Complete? [pdf]

Kessler Syndrome Has Started [video]

Complex Heterodynes Explained

EVs Are a Failed Experiment

MemAlign: Building Better LLM Judges from Human Feedback with Scalable Memory

CCC (Claude's C Compiler) on Compiler Explorer

Homeland Security Spying on Reddit Users

Actors with Tokio (2021)

Can graph neural networks for biology realistically run on edge devices?

Deeper into the shareing of one air conditioner for 2 rooms

Weatherman introduces fruit-based authentication system to combat deep fakes

Why Embedded Models Must Hallucinate: A Boundary Theory (RCC)

A Curated List of ML System Design Case Studies

Pony Alpha: New free 200K context model for coding, reasoning and roleplay

Show HN: Tunbot – Discord bot for temporary Cloudflare tunnels behind CGNAT

Open Problems in Mechanistic Interpretability

Bye Bye Humanity: The Potential AMOC Collapse

Dexter: Claude-Code-Style Agent for Financial Statements and Valuation

Digital Iris [video]

Essential CDN: The CDN that lets you do more than JavaScript

They Hijacked Our Tech [video]

Vouch

HRL Labs in Malibu laying off 1/3 of their workforce

Show HN: High-performance bidirectional list for React, React Native, and Vue

Show HN: I built a Mac screen recorder Recap.Studio

Ask HN: Codex 5.3 broke toolcalls? Opus 4.6 ignores instructions?

Zpdf: PDF text extraction in Zig

Comments