Jpeg and other lossy compression images should allow some of that, but dependens on compatibility of compression between gzip and image format.
There is that example where you have "zero image" of big dimensions, but can you actually conflate gzip and image compression?
Considering I’ve seen real world JPEGs above 300:1 (https://eoimages.gsfc.nasa.gov/images/imagerecords/73000/739...) I would not be surprised if you could craft a jpeg getting very close to or exceeding 4 digits.
PNG is actually a description of the RGB value for the individual pixels. That's why I believe you could png bomb, you could have a 2 billion by 2 billion black pixel image which would ultimately eat up a bunch of space in your GPU and memory to decode.
Perhaps something similar is possible with a JPEG, but it's really nothing to do with the compression info. JPEGs have a max size of 65,535×65,535, which would keep you from exploding them.
If you send it compressed over the wire, you could get another factor of 1032, or perhaps more depending on which algorithms the client supports. Also, you could generate it on demand as a data stream. Bit these run the risk of the client stopping the transfer before ever trying to process the image.
However, it may work with the article's process - a 100x100 png with lots of 2GB-of-nothing iTXt chunks could be gzipped and served with `Content-Encoding: gzip` - so it would pass the "is a valid png" and "not pixel-huge image" checks but still require decompression in order to view it.
The same trick works with PNG, actually. Possibly even better: it uses a pair of 32-bit integers for the resolution.
I consulted for a bank once where the server stripped metadata and re-encoded images from scratch again and the devs thought that would remove any maliciousness. It's just pixels right? I might have thought so as well, but I had this idea and wanted to double check, and it didn't take long to find someone smarter than me had already done the work: https://web.archive.org/web/20250713054441/http://www.idontp... (By now I see there are a dozen commercial parties that rank higher for this topic. Marginalia search helped me re-find the OG post just now)
Edit, thought I should add: the solution is to specify the correct content type. Don't let your PHP interpreter interpret files in the user uploads directory. Don't serve images with content-type text/html because the browser will interpret it as HTML (as instructed) and run any code inside on your domain ('origin'). Mark data as separate from code whenever possible, or escape it when that's impossible
Well, for that use the differences in HTML&CSS support and filtering ...
I guess the reason they added this was that they noticed many mails contain same tracking images and decided to cut of tracking data that way.
Either way, the correct full URL is fetched with the full query string. It's just how it's cached that is affected.
It appears that you can't do these sorts of things with with CID embedded images...
Yes. Both, docx and xlsx are literally just a zip of XML files with a different extension. PDF can contain zlib streams, which use deflate compression just as gzip, so all the mentioned methods apply to all three formats.
Due to the inherent fuzziness/diversity in all models right now I don't think there is a universal approach to this idea but it is something people deploying these systems may want to try and detect.
Eventually tracked it down to an email which contained a zip of stock trading data – just the three letter stock code and the shift. It wasn't malicious, it just had an extraordinarily high compression ratio!
jerf•10h ago
If you are processing emails for security reasons, and want to find viruses even if they are in archive files, it's easy to write the code to "just keep unarchiving until we're out of things to unarchive", but not only can that lead to quite astonishing expansions, it can actually be a process that never terminates at all.
I remember when I first read about these, and "a small file that decompresses to a gigabyte" was also "a small file that decompresses to several multiples of your entire hard disk space" and even servers couldn't handle it. Now I read articles like this one talking about "oh yeah Evolution filled up 100GB of space" like that's no big deal.
If you have a recursive decompressor you can still make small files that uncompress to large amounts even by 2025 standards, because the symbols the compressor will use to represent "as many zeros as I can have" will themselves be redundant. The rule that you can't compress already-compressed content doesn't necessarily apply to these sorts of files.
cyanydeez•10h ago
Before AGI, there will be a untenable gullible general intelligence.
colechristensen•9h ago
My bet is that if AGI is possible it will take a form that looks something like
Where x is a billions long vector and the parameters in A (sizeof(x)^2 ?) are trained and also tuned to have period 3 or nearly period three for a meta-stable near chaotic progression of x."Period three implies chaos" https://www.its.caltech.edu/~matilde/LiYorke.pdf
That is if AGI is possible at all without wetware.
Y_Y•8h ago
colechristensen•8h ago
https://en.m.wikipedia.org/wiki/Critical_brain_hypothesis
mindesc•7h ago
cyanydeez•7h ago
Certainly intelligence is a reduction of entropy, but it's also certainly not stable. Just like cellular automata (https://record.umich.edu/articles/simple-rules-can-produce-c...), loops that are stable can't evolve, but loops that are unstable have too much entropy.
So, we're likely searching for a system thats meta stable within a small range of input entropy (physical) and output entropy (information).
JoshTriplett•8h ago
masklinn•10h ago
panarky•9h ago
bspammer•8h ago
philodeon•5h ago
Twirrim•3h ago
Twirrim•8h ago
jamesfinlayson•5h ago
zikduruqe•5h ago
edit - this? https://idiallo.com/blog/zipbomb-protection
jamesfinlayson•4h ago
ac29•2h ago
Is this actually a practical issue though? Windows, Mac and Linux all support transparent compression at the filesystem level, so 100GB of /dev/zero isnt actually going to fill much space at all.
kiwijamo•1h ago