Streaming compression beats framed compression

36•bouk•1mo ago

Comments

lambdaloop•1mo ago

Does streaming compression work if some packets are lost or arrive in a different order? Seems like the compression context may end up different on the encoding/decoding side.. or is that handled somehow?

duskwuff•1mo ago

It sounds as though the data is being transferred over HTTP, so packet loss/reordering is all handled by TCP.

dgoldstein0•1mo ago

Yes, or by http3's in order guarantees on the individual streams (as http3 is udp)

dgoldstein0•1mo ago

I think the underlying protocol would have to guarantee in order delivery - either via tcp (for http1, 2, or spdy), or in http3, within a single stream.

gkbrk•1mo ago

WebSockets [1] run over TCP, and the messages are ordered.

There is RFC 9220 [2] that makes WebSockets go over QUIC (which is UDP-based). But that's still expected to expose a stream of bytes to the WebSocket, which still keeps the ordering guarantee.

[1]: https://datatracker.ietf.org/doc/html/rfc6455

[2]: https://datatracker.ietf.org/doc/rfc9220/

duskwuff•1mo ago

Before you get too excited, keep two things in mind:

1) Using a single compression context for the whole stream means you have to keep that context active on the client and server while the connection is active. This may have a nontrivial memory cost, especially at high compression levels. (Don't set the compression window any larger than it needs to be!)

2) Using a single context also means that you can't decompress one frame without having read the whole stream that led up to that. This prevents some possible useful optimizations if you're "fanning out" messages to many recipients - if you're compressing each message individually, you can compress it once and send the same compressed message to every recipient.

adzm•1mo ago

The analogy to h264 in the original post is very relevant. You can fix some of the downsides by using the equivalent of keyframes, basically. Still a longer context than a single message but able to be broken up for recovery or etc.

yellow_lead•1mo ago

> This may have a nontrivial memory cost, especially at high compression levels. (Don't set the compression window any larger than it needs to be!)

It sounds like these contexts should be cleared when they reach a certain memory limit, or maybe reset periodically, i.e every N messages. Is there another way to manage the memory cost?

treyd•1mo ago

That's a misunderstanding. Compression algorithms are typically designed with a tunable state size paramter. The issue is if you have a large transfer that might have one side crash and resume, you need to have some way to persist the state to be able to pick up where you left off.

michaelt•1mo ago

LZ77 compression (a key part of gzip and zip compression) uses a 'sliding window' where the compressor can tell the decompressor 'repeat the n bytes that appeared in the output stream m bytes ago'. The most widely used implementation uses a 15 bit integer for m - so the decompressor never needs to look more than 32,768 bytes back in its output stream.

Many compression standards include memory limits, to guarantee compatibility, and the older the standard the lower that limit is likely to be. If the standards didn't dictate this stuff, DVD sellers could release a DVD that needed a 4MB decompression window, and it'd fail to play on players that only had 2MB of memory - setting a standard and following it avoids this happening.

efitz•1mo ago

When I worked at Microsoft years ago, me and my team (a developer and a tester) built a high volume log collector.

We used a streaming compression format that was originally designed for IBM tape drives.

It was fast as hell and worked really well, and was gentle on CPU and it was easy to control memory usage.

In the early 2000s on a modest 2-proc AMD64 machine we ran out of fast Ethernet way before we felt CPU pressure.

We got hit by the SOAP mafia during Longhorn; we couldn’t convince the web services to adopt it; instead they made us enshittify our “2 bytes length, 2 bytes msgtype, structs-on-the-wire” speed demon with their XML crap.

masklinn•1mo ago

Surely that is obvious to anyone who has compared zip and tgz?

skulk•1mo ago

MUD clients and servers use MCCP which is essentially keeping a zlib stream open, adding text to it, and flushing it whenever something is received. I think this has been around since 2000.

https://tintin.mudhalla.net/protocols/mccp/

vlovich123•1mo ago

Using zstd with a tuned small file custom dictionary probably gets you most of the benefit without giving up independence of compression.

bob1029•1mo ago

There is a proposal out there for serving & using custom compression dictionaries over HTTP:

https://www.ietf.org/archive/id/draft-ietf-httpbis-compressi...

almaight•1mo ago

mwss https://github.com/go-gost/x/blob/master/dialer/mws/dialer.g...

Tiny C Compiler

SectorC: A C Compiler in 512 bytes

Speed up responses with fast mode

Brookhaven Lab's RHIC concludes 25-year run with final collisions

Software factories and the agentic moment

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

Stories from 25 Years of Software Development

Hoot: Scheme on WebAssembly

FDA intends to take action against non-FDA-approved GLP-1 drugs

First Proof

Vocal Guide – belt sing without killing yourself

Al Lowe on model trains, funny deaths and working with Disney

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

Start all of your commands with a comma (2009)

The F Word

Show HN: A luma dependent chroma compression algorithm (image compression)

I write games in C (yes, C) (2016)

Eigen: Building a Workspace

Show HN: Craftplan – Elixir-based micro-ERP for small-scale manufacturers

Microsoft account bugs locked me out of Notepad – Are thin clients ruining PCs?

The AI boom is causing shortages everywhere else

Selection rather than prediction

Reinforcement Learning from Human Feedback

Unseen Footage of Atari Battlezone Arcade Cabinet Production

Where did all the starships go?

A Fresh Look at IBM 3270 Information Display System

Learning from context is harder than we thought

72M Points of Interest

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Hackers (1995) Animated Experience