I do wonder, since a lot of tools outside of the MS ecosystem can read Office files (e.g. LibreOffice and Google Docs as well as plenty of other online tools), if indeed the hack as described by the article is possible. One would just need to figure out the ZIP stacks used by said tools.
Then you have people on Linux or macOS who might also use LibreOffice, Apples Office suite, or something else entirely.
And given MS Office is the de facto standard, you’ll often see people open OOXML documents within non-MS office suites.
After all, OOXML is an open standard (sarcasm).
ODF (the document formats favoured by most other office suites) is also ZIP-based XML. So they too could be vulnerable.
There's a whole extra level of archive file format tooling gotchas that one misses out on when one assumes "UNIX" for everything, and does not account for "FAT", "NTFS", "HPFS", and even "OpenVMS".
Or ZIP64. (-:
* https://github.com/dotnet/runtime/blob/main/src/libraries/Sy...
* https://github.com/mihula/ProDotNetZip/blob/main/src/Zip/Zip...
Nope, because a typical accounting asset wouldn't make it and you know it so you forward the PDF to them.
Are you saying that parser discrepancy is a well known problem OR that this exact technical issue described (CD offset vs CD size used to derive CD location) is a well known problem?
If the prior, you are of course right — parser discrepancy and the existence of these kinds of "multiple personality" files isn't a new thing. I'm pretty sure this can be traced back to at least the '80 :)
However, if you mean the latter, I would appreciate a link if you can recall where you've seen it before / read about it — I can then add it to the article.
A commenter on an other post of that article noted that technically there's no reason for the central directory to be right before the EOCD so seeking backwards from the EOCD by the size of CD is just incorrect. In fact zip was designed such that the central directory could be split across multiple disks (and later files), so it was not possible to guarantee a simple backwards jump from the EOCD to the start of CD.
Yes, and in fact APPNOTE.TXT expects implementations to deal with discontiguous CD/EOCD records; it actually mandates that some records appear between them. (It's where ZIP64-specific records go, for example.)
The whole thing seems to stem from a belief by the author of this post that Info-ZIP is the canonical implementation. (This is of course wrong. That would be PKZIP.)
If Info-ZIP is exhibiting the behavior described, then the Info-ZIP implementation is flat out wrong.
(This has happened before. See <https://news.ycombinator.com/item?id=27925393> where the author of that post provides pushback to Mark Adler, receives pushback from Adler in response, and then just stops pushing and defers to Adler's reading. Adler is an expert in his domain, and he's made valuable contributions to free software, but his expertise is in compression, and he's not the authority on the spec or the format; Adler didn't create ZIP or write APPNOTE.TXT—Phil Katz did.)
> a belief by the author of this post that Info-ZIP is the canonical implementation
↑ Where did you get this from?
I don't think nowadays there's such a thing as a canonical implementation — ZIP implementation world is too fragmented and implementations are too widespread.
One more note is that the article isn't about who's right or wrong in terms of interpreting APPNOTE.TXT — this is besides the point. The point there is that a parser discrepancy can cause security issues, and the article documents just another discrepancy found in the real world implementations (and there are A LOT of these with regards to the ZIP format).
This is a pretty clear statement on normativity.
You also left out an important part of my comment when you quoted it. (The operative word at the front of the sentence is "seems".)
I disagree – it's a technical remark on redundancy of fields (this is related to my hypothesis in the article that redundancy in formats is the primary cause of parser discrepancy). It doesn't acknowledge InfoZIP being canonical in any way.
> You also left out an important part of my comment when you quoted it. (The operative word at the front of the sentence is "seems".)
Fair, though my point still stands.
It's talking about how the EOCD contains both the size of the central directory and the offset of the start of the central directory, which is redundant. So we end up with some tools honoring the offset, while some subtract the size from the EOCD.
I don't think I've actually ever received a zipped up .pdf, so it's clearly not legitimately necessary for anyone to do so and should you ever see it you should treat it highly suspiciously.
I get admin@ emails for our company domain and there is a somewhat steady stream of run of the mill fraud attempts.
A trick I've seen happen quite a lot recently is emails with .svg attachments, which have some lightly obfuscated JavaScript in them and which ultimately redirects you to some dodgy looking URL (which I never visited).
I simply made a rule to outright reject all .svg files from external sources and I get a report any time it's attempted. In about the last 12 months this has been running, it's probably blocked about 20 incoming emails and only one of those was a false positive and even the false positive was a weird case as we were sending a .svg file to a creative company who for some reason had our .svg attachment in their reply back to us.
But in emails? Just attach however many PDFs you need and send, they don't really compress anyway; and I think most web-mail fronts actually allow you to download all the attachments as a single .zip — but obviously those .zip-files are not maliciously crafted (I hope, at least).
Also, now that I think of it, forwarding the PDF you've extracted and visually reviewed instead of the original .zip-file would defeat this attack (unless, of course, it's the PDF file that's schizophrenic).
(I see it all the time. Along with the password in the very same email)
e.g. someone downloaded the password protected zip on a public computer, logged out of their email, but forgot to delete the file.
What kind of crazy logic is that?
I'm sure there's other common reporting use cases as well
soupfordummies•7mo ago
netsharc•7mo ago
B1FF_PSUVM•7mo ago
wat10000•7mo ago
o11c•7mo ago
JdeBP•7mo ago
There was a time when passing ZIP files around was a very popular method of software distribution, and things like this were gotchas that had to be watched for. It was widely known, at least amongst sysops, that the varied toolsets that handled ZIP archives were functionally different. And there were scanners and sanity checkers, and bugfixes to PKUNZIP, that dealt in this stuff for uploaded files and FREQ responses.
Did people exploit the differences? Yes. Although it was mainly on the level of creating prank ZIP files on non-Microsoft operating systems with 8.3 filenames such as "PRN" or "CLOCK$".
* https://groups.google.com/g/alt.comp.virus/c/zLV-Y2a71gs/m/U...
However, the truly terrible idea of self-extracting archives was popular, which meant that archives with "interesting" arrangements of the archive within the overall file were widespread. ZIP comments were also liberally applied and altered by pretty much every BBS that passed an archive along. And the Unix people wanted to be able to use pipes, something that the MS-DOS original never had to cater for.
Also, there were people who exploited the fact that different tools took different things as gospel. Even within the past decade one can find people still being caught out by the fact that there's a header field that instructs what the pathname separator character(s) used are; and that ZIP tools that expect non-seekable streams operate differently to ZIP tools that expect seekable regular files.
wqweto•7mo ago
JdeBP•7mo ago
These people having fun with "Unix" versus "FAT" over the past decade are seeing the tip of the iceberg, given that there was a PKZIP for OS/400, there is a PKZIP for z/OS (and a competitor that claims to be cheaper), there are tools of varying degrees of Unixiness for systems like the Atari ST and OS/2, and a whole bunch of things have accrued over the years such as an outright extra header giving alternative filenames specifically for MacOS.
* https://michaelrommel.com/create/2022-12-28-malformed-zip-fi...
* https://unix.stackexchange.com/q/166159/5132
* https://github.com/filebrowser/filebrowser/issues/1768
rendx•7mo ago
charleslmunger•7mo ago