My ZIP isn't your ZIP: Identifying and exploiting semantic gaps between parsers

https://www.usenix.org/conference/usenixsecurity25/presentation/you

47•layer8•3d ago

https://www.usenix.org/system/files/usenixsecurity25-you.pdf

Comments

hinkley•6h ago

Maybe an argument to use zlib consistently.

aaviator42•6h ago

An argument for a better defined file format specification perhaps, but I don't think it's necessarily a good thing for everyone to use or have to use the same implementation.

Muromec•4h ago

If everyone has the same parser the whole classes of bugs just stop being exploitable. The classic one being one parser at the edge validates somethhing and the further down the line sees another result which it expects tp be rejected during validation.

Both parsers could be buggy, but when they have different kinds of bugs, you get a zero click undetectable exploit

woodruffw•4h ago

I don’t think it’s this simple: you can still produce observable differentials with a single parser by using different options within that parser in different places. The ZIP format itself affords ample opportunities for that.

socalgal2•3h ago

As someone who works on specs that are shared across different organizations' implementations, you can write all the specs you want but no conformance tests = no conformance.

woodruffw•4h ago

Unless, of course, the differential occurs between versions of zlib. I think the bigger problem here is that ZIP is just not a very well defined format.

blibble•4h ago

zlib (deflate) is just the compression type usually (not always) used in zips

zip is the container around it

actionfromafar•5h ago

Tampering with signed binaries sounds pretty serious

tptacek•4h ago

It depends on how they're signed. A signature format that works on individual objects inside of an archive, rather than on a whole signed archive, seems crazy. In this case, it's a JAR file loader; doesn't seem like that big a deal?

o11c•5h ago

Key line from the abstract, since zip parser differences in general are old news:

> We summarize our findings as 14 distinct parsing ambiguity types in three categories with detailed analysis, systematizing current knowledge and uncovering 10 types of new parsing ambiguities.

tptacek•4h ago

This is a really good paper that reaches a bunch of fun conclusions, but to my eyes the practical findings are kind of marginal --- you can defeat an AV scanner, but you could already defeat AV scanners; you can defeat plagiarism-detectors, but you could already defeat plagiarism-detectors; you can package a malicious Java class in a benign-looking JAR, but that attack presumes you're convincing a target to load a JAR file you control.

The one legit-practical attack I see is the one where they trick the VS Code Extension marketplace into serving extensions with trusted publishers, but even there I'm struck by the fact that the security model for verifying extensions would depend on ZIP metadata.

I do not at all mean to talk this work down; this is my favorite species of vulnerability research, and I can see why it did well at Usenix Security.

FreakLegion•1h ago

It's a decent systematic look at something people have been doing ad hoc for a long time. In 2010 or so I realized:

1. Authenticode signatures have unauthenticated sections.

2. ZIP files don't require headers.

So you can shove a ZIP file (i.e. JAR, DOCM, APK, etc.) into a signed Windows executable without breaking its signature, and then depending on the extension it will do any number of things when clicked.

(The extent to which this works has changed a lot in the intervening years, but prior to a patch in 2013 it was especially bad, and the patches never made their way into the spec, so custom Authenticode validators like Wine's or, say, the one in Palo Alto Networks gear, were still vulnerable the last time I checked.)

Anyway, at the same time:

1. Cybersecurity products lean on Authenticode to keep false positives down for specific publishers.

2. Those same products cache everything by hash without regard for file type.

Put all of this together and you could, as of 2020 at least, not only execute whatever you wanted, you could also have it misreported by CrowdStrike or whoever as a signed Windows component.

Fun stuff, but I agree that it's kind of marginal.

pixl97•4h ago

Zip is a fun minefield across different OS's, libraries, and ages of system. Zip64 is a fun one I've seen companies forget to test and end up with data loss with over 65535 files in a zip when interacting with more modern systems. There are really so many things you need to test that going with some other compression without the pitfalls is your best choice if possible.

captn3m0•3h ago

Also related to ZIP parsing differentials, recently reported and fixed at PyPi: https://blog.pypi.org/posts/2025-08-07-wheel-archive-confusi...

tptacek•1h ago

It's good to see stuff like this getting found and fixed, but let me ask: given how the Python packaging ecosystem works, what is the practical scenario in which this would be exploitable?

saurik•3h ago

I'm cited on the first page of this paper (reference 20) for my work on the Android Master Key vulnerability (which I didn't find, to be clear, but I did most of the exploitation people saw), and, while this paper looks AWESOME (and I'm very excited to read it in detail), if you are interested in this concept but feel you need something a bit more concrete--maybe with diagrams and some hand-holding--to understand what is going on, I will recommend my series of articles on Master Key as an introduction.

https://www.saurik.com/masterkey1.html

https://www.saurik.com/masterkey2.html

https://www.saurik.com/masterkey3.html

schoen•1h ago

This is great. It feels like a central example of the phenomenon of parser differentials (and nice use of tools to find them more efficiently).

Also, as the lead author's name is spelled the same as an English pronoun, we can anticipate natural language parsing ambiguities from writing about this research in English prose! For example, "You discovered that there are many opportunities for parser differentials due to the underspecified nature of the ZIP format" or "You described a practical method of bypassing plagiarism detectors and several other kinds of file content scanners".

Actually, I'm tempted to propose that for the April Fool's Did You Know? on Wikipedia next year. "Did you know ... that You won a Usenix Security award for finding ways to construct ambiguous texts?"

pabs3•1h ago

A linter for zip files that can probably detect some of these:

https://github.com/ronomon/pure

est•11m ago

IIRC similar attacks exist on DEFLATE

there used to be a .png picture displays totally different content on safari/firefox/IE.

Why is choral music harder to appreciate?

Show HN: Sping – An HTTP/TCP latency tool that's easy on the eye

Busy beaver hunters reach numbers that overwhelm ordinary math

Git-Annex

From Hackathon to YC

The two versions of Parquet

We put a coding agent in a while loop

Is 4chan the perfect Pirate Bay poster child to justify wider UK site-blocking?

German contest to live in depopulated Soviet-era city proves global hit

Y Combinator files brief supporting Epic Games, says store fees stifle startups

The Unix-Haters Handbook (1994) [pdf]

Ghrc.io appears to be malicious

Trees on city streets cope with drought by drinking from leaky pipes

Burner Phone 101

Show HN: Decentralized Bitcoin Incentives via QR Codes

Making games in Go: 3 months without LLMs vs. 3 days with LLMs

A Brilliant and Nearby One-off Fast Radio Burst Localized to 13 pc Precision

Everything I know about good API design

Cloudflare incident on August 21, 2025

Uncle Sam shouldn't own Intel stock

Bash Strict Mode (2014)

Show HN: Clearcam – Add AI object detection to your IP CCTV cameras

How many paths of length K are there between A and B? (2021)

Halt and Catch Fire Syllabus (2021)

My ZIP isn't your ZIP: Identifying and exploiting semantic gaps between parsers

Claim: GPT-5-pro can prove new interesting mathematics

How to check if your Apple Silicon Mac is booting securely

Comet AI browser can get prompt injected from any site, drain your bank account

Show HN: I Built a XSLT Blog Framework

NASA's Juno mission leaves legacy of science at Jupiter

My ZIP isn't your ZIP: Identifying and exploiting semantic gaps between parsers

Comments

Why is choral music harder to appreciate?

Show HN: Sping – An HTTP/TCP latency tool that's easy on the eye

Busy beaver hunters reach numbers that overwhelm ordinary math

Git-Annex

From Hackathon to YC

The two versions of Parquet

We put a coding agent in a while loop

Is 4chan the perfect Pirate Bay poster child to justify wider UK site-blocking?

German contest to live in depopulated Soviet-era city proves global hit

Y Combinator files brief supporting Epic Games, says store fees stifle startups

The Unix-Haters Handbook (1994) [pdf]

Ghrc.io appears to be malicious

Trees on city streets cope with drought by drinking from leaky pipes

Burner Phone 101

Show HN: Decentralized Bitcoin Incentives via QR Codes

Making games in Go: 3 months without LLMs vs. 3 days with LLMs

A Brilliant and Nearby One-off Fast Radio Burst Localized to 13 pc Precision

Everything I know about good API design

Cloudflare incident on August 21, 2025

Uncle Sam shouldn't own Intel stock

Bash Strict Mode (2014)

Show HN: Clearcam – Add AI object detection to your IP CCTV cameras

How many paths of length K are there between A and B? (2021)

Halt and Catch Fire Syllabus (2021)

My ZIP isn't your ZIP: Identifying and exploiting semantic gaps between parsers

Claim: GPT-5-pro can prove new interesting mathematics

How to check if your Apple Silicon Mac is booting securely

Comet AI browser can get prompt injected from any site, drain your bank account

Show HN: I Built a XSLT Blog Framework

NASA's Juno mission leaves legacy of science at Jupiter