Edit: ok I see the photos in the repo. Putting them in the README would be helpful
Much of what JPEG/MPEG are doing is rearranging the problem such that it is possible to create long runs of zeroes. The way in which a DCT block is scanned relative to the location of its AC/DC components is potentially one of the most innovative aspects of many video & image compression techniques.
What the DCT does along with the color representation transformation is to turn fine details into higher frequencies and core details into low frequencies. From there, the quality of the image and thus compression ratio is as simple as dropping high frequency representations.
And besides that, jpegs use a Huffman table to further reduce the size of the image.
AFAIK, it doesn't do anything special to reduce runs. So lining up zeros really doesn't help much.
But DCT isn't very useful for lossless since any lossless frequency domain representation requires more range than the source, and you can't quantize to counter it.
OP's approach is actually terrible for video compression, because it actively throws away the locality of pixel changes present in typical video.
A nicer way of putting it would be that nothing about OP's technique is specific to video frames -- the same idea could be used to compress the diff between any two equal-length bit sequences. Might this nevertheless turn out to be better than existing forms of compression for this problem, like gzipping the concatenated pair of blocks? No, because the only way you get compression at all is if the distribution of inputs (here, sets of different bit positions) is highly predictable, i.e., nonrandom -- and pushing data through a hash function actively destroys that (especially if it's a cryptographically strong hash, the whole point of which is to produce output that is indistinguishable from random).
[1] https://github.com/ross39/new_bloom_filter_repo/blob/main/co...
Also, if you want fast, good compression, use zstandard not gzip.
There are probably other reasons why this is a bad idea but I'm curious to try it.
Ultimately the data they are storing is tuples of (x,y,r,g,b), which means that it is not sufficient to ask "does (x1,y1) pass the bloom filter?" which would be embarrassingly parallel; you have tobe able to match it to an index in your list of (r,g,b) data.
The way this is done is to iterate over all (x,y) in a consistent order for encoding and decoding, and for each (x1,y1), if it passes the bloom filter, then the next (r1,g1,b1) tuple is the pixel data for that pixel.
Isn't the idea of lossless pretty meaningless for all consumer grade purposes, especially if the input example was already mangled through YouTube's transcoders?
Besides possibly going from some MPEG1 or older to .h264/like lossless transcoding, I see no benefit in lossless methods here.
From my personal experience, I tried archiving some of my old DVDs where no better sources are available for purchase for the forseeable future by transcoding to even .265 at absurdly high bitrates, but they all looked worse at higher bitrates than simply re-containerized MPEG2 for media-server streaming.
All you do is transcode the output of an existing compressing wasting information with conserving artifacts from a prior reductive step.
4:4:4 LOG or something would be a target to benchmark here.
Even "lossy" Apple ProRes start out at 275 Mbit/s for ProRes 4444 at HD25p already.
This is not about lossless being practical now with this, or about reeconding YouTube videos with this providing any practical utility, or anything. It's just about using bloom filters for compression. The motivation was just the technical interest in bloom filters. They say as much in the readme:
> This project explores an unconventional approach: repurposing Bloom filters—typically used for membership testing—as a lossless video compression mechanism.
The video source is practically irrelevant as long as it's an actual video.
In practice you’d likely get better compression (and faster progress) by piggybacking on AV1 strengths – then use Bloom-filter trick just on the leftover sparse differences – rather than building a new codec from scratch.
The residuals can be expected to be quite noisy and will likely not profit from any intra frame predictability anymore.
JPEG XL’s lossless engine already improves on PNG by ~35%, that and other general purpose compression methods would then be the per frame benchmark here on the residuals to beat.
In short: use the proven gear (motion-compensated blocks + modern transforms) to do the heavy lifitng, then let the bloom filter chase the hopefully comparably small residual.
As a showcase of what bloom filters are this would be still worthwhile, but I don't see any practical benefit here yet.
Not to forget, there is a reason visually lossless is the de facto the norm now, even in production grade environments, storage space is still not free while the average uncompressed display stream easily reaches way north of 5Gbps now easily, there is only so much lossless can resonably do here.
Are the results impressive? Probably not, but it's still an interesting idea.
Re-Containerizing MPEG-TS as-is to something like mkv, vs. Transcoding is exactly what I'm talking about here.
There are currently not any meaningful ways known to me, to even make MPEG2 files significantly smaller in way more modern and advanced codecs without loosing perceived quality even at the same bitrate.
Not even talking about interlacing issues here.
So for anything MPEG2 and newer lossless reencoding seems quite a futile excersise to me from my personal experience.
If there is a promising way I'm all here for it, but this sadly doesnt look like it.
1. It creates a bitmap where each bit is a pixel in the image, if from frame 0 to frame 1 a given pixel changed, the corresponding bit is 1, otherwise it is 0.
2. All the 1s are added to the bloom filter, hashing their offsets. Now the bloom filter will be positive for all such indexes plus a percentage of false positive indexes.
3. We query the bloom filter to see all the indexes that are positive, and for all such pixels we store the raw pixel data of what changed. So we can reconstruct the next frame easily.
You can think at this like as storing the delta between two frames as: x,y,r,g,b of all the pixels that changed, but compressing a lot the x,y part at the cost of storing a bit more r,g,b than needed.
I have the feeling that since the pixels that changes from frame 0 to frame 1 are often similar (in their location) to what will change from frame 1 to frame 2, there is the possibility of further compressing that as well, by setting the right flags in the next frame and storing verbatim the only offsets that changed in addition to the previous or alike.
Oh hey you're the guy who made kilo. Good job.
[edit] lol he edited it... they always edit it
Yes there's Redis Cluster, but that's a separate thing, and most Redis installations don't use it.
Part of it seems to be that software languages have "flavors" like natural spoken language does, so that one seems natural to one person and foreign to another, much like how DHH took TypeScript out of Rails, and I can't read Rails code or live without static types.
Also college kids are always learning what the last generation already knew (like Lisp) and reinvent everything in it with the vigor of their youth and it gets popular, which I imagine is how Rails and Redis both started and caught on.
But sometimes there are genuine innovations, and we can't build on them until someone comes up with them, much like how we couldn't invent machines until someone figured out that we could just use the lever to make gears to transport energy, which Archimedes didn't think of. And the more we learn collectively, the more these things spread.
I wonder if this means it'll all stabilize into a JavaScript-like toolchain one day. Maybe Gary Bernhardt was right all along.
All the chip world talk of single function machines and how tech is now energy constrained industry, means pruning the state we store; most software is software to deliver software.
Single function pipeline hardware platforms will act as little more than sync engines from models.
I mean it’s is 2025. Still write software like it’s the 1970s. So high tech.
Edit: https://arxiv.org/abs/2309.10668
From LLMs to energy models that transform electromagnetic geometry. Saving what’s needed to recreate cat pics and emails to mom. Pruning the tail as interest dips with generational churn.
Redis will eventually become obsolete. It may take 10 or 50 years, but it will happen.
But kilo taught many developers about C, editors, terminals, and how simple it is to do syntax highlighting. This epiphany inspired me and many others, and the things we make because of that will last much longer than Redis.
Most people use it as a cache,.. you can build a lot of things with it that are far outside this narrow view... ( say a bidding system or something like that )
And for the other comment Scaling is possible via the commercial offering.. sofaik..
I didn't get to persuade anyone to let me use it for that.. I really wanted to.
But it has a fantastic ease of use
I can see bloom filters as part of a complicated hybrid strategy compressor. The more tools on the belt of such a compressor, the better, but I don't think it'll improve things much on average.
There's some flexibility for how one would do that second increment, but of course naively it might look like constructing a sparse matrix and summing so that feels pretty similar. But the flexibility might be nice in some cases, and bloom filters are an extremely space efficient representation with arbitrarily low false positive rate
https://github.com/ross39/new_bloom_filter_repo/blob/4798d90...
As a rule of thumb, Bloom filters require 10 bits per element for a 1% false positive rate. Say 100 pixels changed between the frames, that 1000 bits or ~125 bytes to store the which-pixel-changed-map.
Runlength encoding of the (in)active bits can use anything from ~10-200 bytes for 100 pixels (Say 1 byte per run, 200 runs of active vs. inactive in the worst case).
You could store all of that in a single map from (x,y) to (r,g,b), which would cost you, say, 4 bytes per pixel for the (x,y) data and then, say, 30 bits or (rounding up) 4 bytes per pixel for the (r,g,b) with 10-bit color. A hash map would optimize lookup, but for efficient storage you really just want an array of 8-byte chunks where the first two bytes are x, next 2 are y, next 10 bits are r, then g, then b. To decode, you just start with the previous frame and then iterate down the list and apply each diff in the list.
If you iterate over (x,y) in a consistent order, you avoid the need to store it because you can simply say that the next (r,g,b) tuple in your list is the pixel data for the next (x,y) value that passes the bloom filter. This effectively cuts the amount of data you have to store per pixel in half, but increases the number of pixels you have to store by a probabilistically small amount because bloom filters have false-positives.
It reminds me of experiments I did with wavelets for image compression some.. 22 years ago.
The inverse transform starts with a small pixel image and then uses just as many coefficients to transform it to an image twice as large (in width or height). This is done over and over again.
The point is that most of your data consists of these coefficients, and most of them are close to zero: so you can flush them to zero. Then the problem is mostly a matter of how to encode where you don't have zeroes: i.e. you have like a bitmap and an array of the non-zeroes. There were different algorithms that encoded non-zeroes, more or less conservatively, but they did often exploit the property that these non-zeroes typically were quite clustered — which is the opposite of a typical hash function used for a bloom filter.
Image compression this way had obviously very poor locality, first just for the transform and then for the coefficient compression, so it was quite slow. Therefore it did feel like a dead end.
With raw video input, I think the assumption "most pixels change little (or not at all) between consecutive frames, creating a sparse difference matrix ideal for this approach." would break down. For a very clean signal (low-noise sensor, brightly lit scene), maybe it'd work, but most real-world signals will have noise > 1 LSB, so I'd expect the lower bits to be changing at least half the time.
Sending the video through a compression and decompression cycle first will tend to remove that noise, creating an artificially static video where that assumption holds up.
Or just buy a camera and run around capturing some raw 8K footage of everyday scenes.
Unless you know to turn it on and deal with the files sizes and processing people may not realize concept of raw/unprocessed exists anymore.
I’d never thought about that before.
Lossless video would make much more sense for digital content like screen recordings, where the assumption that few pixels change between consecutive frames makes much more sense.
In most cases the compression ratio will be low enough that the video will be perceptually lossless.
This looks like it is not storing the diff for any pixel that had the mean of its r,g,b values change by less than 10, which means that if a pixel goes from pure blue (#00ff00) to pure red (#ff0000) in consecutive frames, it would decode as pure blue in both frames.
HSL-distance preserves color similarity with higher fidelity than converting both pixels to greyscale as done in the shared link
I didn't read the code, but the readme focuses on whether less than 32.4% of bits are 1. So even if a lot of the pixels are changing at the lower bits between each frame, so long as enough of the upper bits aren't changing, it should still in principle work?
Is this project about the video compression or about an innovation on Bloom filters?
Is it amenable to stream processing e.g. in an FPGA?
Is the compression ratio acceptable in a modern data stream?
Is it actually a sub-class of compression already known, and therefore probably encumbered?
Compression works by exploiting patterns or tendencies in the input. One of the biggest and simplest patterns in video data is that pixel changes tend to be localised: E.g., a person waving their arm against a static background changes a clump of pixels in a small region only. By hashing positions, you immediately throw away this locality information -- under the proposed scheme, changing 400 pixels confined to a 20x20 rectangle requires ~exactly as much space as changing 400 pixels randomly scattered around the image, while a more efficient representation could record the top-left coords, width and height of the box, and save about 398 copies of pixel coordinates. And yes, this benefit remains even if we don't record the coords explicitly but rather discover them by querying a Bloom filter.
And that's to say nothing of more advanced video compression techniques like motion detection (make the rectangle here look like the rectangle there from the previous frame), which is an absolute staple of every practical codec.
It seems like this would make the compression lossy and discard, e.g., transitions from #ffffff to #fffffa, and the line above it where the mean of pixel data is taken would discard, e.g., transitions from #ff0000 to #00ff00 regardless of the threshold used.
Am I misunderstanding the role of that line of code? It doesn't seem like anything that is a 0 in the resulting mask would be encoded into the bloom filter.
If the algorithm is lossy (small diffs are squeezed to zero) it probably should be compared to other lossy algorithms rather than lossless.
chungy•1d ago
Is this about recompressing existing H.264[1] videos (like downloaded from YouTube) losslessly, or is it about making new videos from a source losslessly? The former reminds me of JPEG XL's ability to re-compress the DCT on old JPEGs, but even as you get better compression, you still don't have a lossless image.
[1] To be fair, H.264 can be lossless in the first place, but YouTube does not serve up such files.
perching_aix•1d ago
runeblaze•1d ago
magicalhippo•1d ago
Traditional video codecs like H.264 and H.265 achieve impressive compression by discarding "imperceptible" visual information. But what if we could guarantee perfect reconstruction while still achieving meaningful compression? This project explores an unconventional approach: repurposing Bloom filters—typically used for membership testing—as a lossless video compression mechanism.
Further down comes the explanation for why this scheme might possibly work:
Rather than compressing whole video frames, this system applies Bloom filter compression to frame differences. This capitalizes on temporal coherence—most pixels change little (or not at all) between consecutive frames, creating a sparse difference matrix ideal for this approach.
Of course, using delta-frame compression has been a common theme of many video codecs for ages now, and many like H.264 and H.265 use additional techniques like motion estimation[1] to reduce the information in the delta frame further before final entropy coding step[2][3].
As such, the best is probably to view this as an alternative to the entropy encoding in H.264 or similar.
[1]: https://en.wikipedia.org/wiki/Motion_estimation#Video_coding
[2]: https://en.wikipedia.org/wiki/Context-adaptive_variable-leng...
[3]: https://en.wikipedia.org/wiki/High_Efficiency_Video_Coding#E...
rh3939•23h ago
pipo234•23h ago
wging•23h ago