frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

Preventing ZIP parser confusion attacks on Python package installers

https://blog.pypi.org/posts/2025-08-07-wheel-archive-confusion-attacks/
36•miketheman•4h ago

Comments

jspiner•4h ago
Thank you for the interesting article.
captn3m0•2h ago
Now I am curious at whether these ZIP confusion attacks are mitigated at other registries that use ZIPs? Are there any such?
zahlman•1h ago
> This has been done in response to the discovery that the popular installer uv has a different extraction behavior to many Python-based installers that use the ZIP parser implementation provided by the zipfile standard library module.

> For maintainers of installer projects: Ensure that your ZIP implementation follows the ZIP standard and checks the Central Directory before proceeding with decompression. See the CPython zipfile module for a ZIP implementation that implements this logic. Begin checking the RECORD file against ZIP contents and erroring or warning the user that the wheel is incorrectly formatted.

Good to know that I won't need to work around any issues with `zipfile` — and it would be rather absurd for any Python-based installer to use anything else to do the decompression. (Checking RECORD for consistency is straightforward, although of course it takes time.)

... but surely uv got its zip-decompression logic from a crate rather than hand-rolling it? How many other Rust projects out there might have questionable handling of zip files?

> PyPI already implements ZIP and tarball compression-bomb detection as a part of upload processing.

... The implication is that `zipfile` doesn't handle this. But perhaps it can't really? Are there valid uses for zips that work that way? (Or maybe there isn't a clear rule for what counts as a "bomb", and PyPI has to choose a threshold value?)

lexicality•1h ago
> but surely uv got its zip-decompression logic from a crate rather than hand-rolling it?

well... https://github.com/astral-sh/rs-async-zip

zahlman•1h ago
Interesting. (I have neither the familarity with Rust, nor the willingness to spend time on it, to decide how much of this is the fault of the original vs the fork.)
woodruffw•1h ago
> and it would be rather absurd for any Python-based installer to use anything else to do the decompression.

You'd reasonably think, but it's difficult to assert this: a lot of people use third-party tooling (uv, but also a lot of hand-rolled stuff), and Python packages aren't always processed in a straight-line-from-the-index manner.

(I think a good reference example of this is security scanners: a scanner might fetch a wheel ZIP and analyze it, and use whatever ZIP implementation it pleases.)

It's also worth noting that one of the differentials here concerns the Central Directory, but the other one is more pernicious: the ZIP APPNOTE[1] isn't really clear about how implementations should key from to EOCDR back to the local file entries, and implementations have (reasonably, IMO) interpreted the language differently. Python's zipfile chooses to do it in one way that I think is justifiable, but it's a "true" differential in the sense that there's no golden answer.

> (Or maybe there isn't a clear rule for what counts as a "bomb", and PyPI has to choose a threshold value?)

Yes, it's this. There are legitimate uses for high-ratio archives (e.g. compressed OS images), but Python package distributions are (generally) not one of them. PyPI has its own compression ratio that's intended to be a sweet spot between "that was compressed really well" and "someone is trying to ZIP-bomb the index."

[1]: https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT

zahlman•27m ago
> You'd reasonably think, but it's difficult to assert this: a lot of people use third-party tooling (uv, but also a lot of hand-rolled stuff),

I mean, for people (like myself) explicitly attempting to implement alternatives to pip. And to my understanding, pip itself does use `zipfile` as well.

Are you proposing that there are people out there making package installers for personal use?

> and Python packages aren't always processed in a straight-line-from-the-index manner.

I don't know what you have in mind here.

woodruffw•5m ago
> Are you proposing that there are people out there making package installers for personal use?

I gave an example in the original comment: there's a lot of random ass tooling out there that treats Python wheels as a mostly opaque archive, and unpacks/repacks them in various ways. The original PEP behind wheels also (implicitly) expects this, since it refers to extraction with a "ZIP client" and not Python's zipfile specifically.

I think security scanners are a simple example, but Linux distros, Homebrew, etc. all also process Python package distributions in ways that mostly just assume a ZIP container, without additionally trying to exactly match how Python's `zipfile` behaves.

> I don't know what you have in mind here.

The security scanner example from the original comment.

GPT-5

https://openai.com/gpt-5/
994•rd•3h ago•1117 comments

Historical Tech Tree

https://www.historicaltechtree.com/
58•louisfd94•1h ago•18 comments

GPT-5: Key characteristics, pricing and system card

https://simonwillison.net/2025/Aug/7/gpt-5/
256•Philpax•2h ago•76 comments

Benchmark Framework Desktop Mainboard and 4-node cluster

https://github.com/geerlingguy/ollama-benchmark/issues/21
87•geerlingguy•2h ago•8 comments

GPT-5 for Developers

https://openai.com/index/introducing-gpt-5-for-developers
248•6thbit•3h ago•117 comments

Building Bluesky comments for my blog

https://natalie.sh/posts/bluesky-comments/
206•g0xA52A2A•4h ago•88 comments

Encryption made for police and military radios may be easily cracked

https://www.wired.com/story/encryption-made-for-police-and-military-radios-may-be-easily-cracked-researchers-find/
30•mikece•2h ago•13 comments

Show HN: Octofriend, a cute coding agent that can swap between GPT-5 and Claude

https://github.com/synthetic-lab/octofriend
36•reissbaker•1h ago•16 comments

Windows XP Professional

https://win32.run/
206•pentagrama•6h ago•127 comments

DNA tests are uncovering the true prevalence of incest (2024)

https://www.theatlantic.com/health/archive/2024/03/dna-tests-incest/677791/
57•georgecmu•2h ago•32 comments

Infinite Pixels

https://meyerweb.com/eric/thoughts/2025/08/07/infinite-pixels/
200•OuterVale•7h ago•45 comments

How to sell if your user is not the buyer

https://writings.founderlabs.io/p/how-to-sell-if-your-user-is-not-the
108•mooreds•5h ago•56 comments

Foundry (YC F24) is hiring staff-level product engineers

https://www.ycombinator.com/companies/foundry/jobs/jwdYx6v-founding-product-engineer
1•lakabimanil•3h ago

Lightweight LSAT

https://lightweightlsat.com/
35•gregsadetsky•2h ago•19 comments

OpenAI's new open-source model is basically Phi-5

https://www.seangoedecke.com/gpt-oss-is-phi-5/
12•emschwartz•1h ago•1 comments

Open music foundation models for full-song generation

https://map-yue.github.io/
19•selvan•3d ago•3 comments

Gemini CLI GitHub Actions

https://blog.google/technology/developers/introducing-gemini-cli-github-actions/
211•michael-sumner•11h ago•87 comments

Show HN: Browser AI agent platform designed for reliability

https://github.com/nottelabs/notte
25•ogandreakiro•3h ago•7 comments

How AI conquered the US economy: A visual FAQ

https://www.derekthompson.org/p/how-ai-conquered-the-us-economy-a
119•rbanffy•10h ago•117 comments

Laptop Support and Usability (LSU): July 2025 Report

https://github.com/FreeBSDFoundation/proj-laptop/blob/main/monthly-updates/2025-07.md
85•grahamjperrin•6h ago•45 comments

A generic non-invasive neuromotor interface for human-computer interaction

https://www.nature.com/articles/s41586-025-09255-w
17•msephton•3d ago•2 comments

Monte Carlo Crash Course: Quasi-Monte Carlo

https://thenumb.at/QMC/
88•zote•3d ago•9 comments

Jepsen: Capela dda5892

https://jepsen.io/analyses/capela-dda5892
59•aphyr•5h ago•6 comments

Leonardo Chiariglione: “I closed MPEG on 2 June 2020”

https://leonardo.chiariglione.org/
190•eggspurt•10h ago•180 comments

The Sunlight Budget of Earth

https://www.asimov.press/p/sunlight-budget
36•mailyk•4h ago•12 comments

Zero-day flaws in authentication, identity, authorization in HashiCorp Vault

https://cyata.ai/blog/cracking-the-vault-how-we-found-zero-day-flaws-in-authentication-identity-and-authorization-in-hashicorp-vault/
199•nihsy•13h ago•87 comments

Preventing ZIP parser confusion attacks on Python package installers

https://blog.pypi.org/posts/2025-08-07-wheel-archive-confusion-attacks/
36•miketheman•4h ago•8 comments

Arm desktop: emulation

https://marcin.juszkiewicz.com.pl/2025/07/22/arm-desktop-emulation/
74•PaulHoule•8h ago•32 comments

Lithium compound can reverse Alzheimer’s in mice: study

https://hms.harvard.edu/news/could-lithium-explain-treat-alzheimers-disease
108•highfrequency•5h ago•68 comments

Claude Code IDE integration for Emacs

https://github.com/manzaltu/claude-code-ide.el
730•kgwgk•1d ago•246 comments