frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

An Ode to Bzip

https://purplesyringa.moe/blog/an-ode-to-bzip/
40•signa11•2h ago

Comments

elophanto_agent•1h ago
bzip2 is the compression algorithm equivalent of that one coworker who does incredible work but nobody ever talks about. meanwhile gzip gets all the credit because it's "good enough"
kergonath•1h ago
Bzip2 is slow. That’s the main issue. Gzip is good enough and much faster. Also, the fact that you cannot get a valid bzip2 file by cat-ing 2 compressed files is not a deal breaker, but it is annoying.
saidnooneever•1h ago
the catting issue might be more an implementation of bzip program problem than algorithm (it could expect an array of compressed files). that would only be impossible if the program cannot reason about the length of data from file header, which again is technically not something about compression algo but rather file format its carried through.

that being said, speed is important for compression so for systems like webservers etc its an easy sell ofc. very strong point (and smarter implementation in programs) for gzip

nine_k•1h ago
Bzip2 is great for files that are compressed once, get decompressed many times, and the size is important. A good example is a software release.
pocksuppet•20m ago
So is xz, or zstd, and the files are smaller. bzip2 disappeared from software releases when xz was widely available. gzip often remains, as the most compatible option, the FAT32 of compression algorithms.
joecool1029•44m ago
> the catting issue might be more an implementation of bzip program problem than algorithm (it could expect an array of compressed files). that would only be impossible if the program cannot reason about the length of data from file header, which again is technically not something about compression algo but rather file format its carried through.

Long comment to just say: ‘I have no idea about what I’m writing about’

These compression algorithms do not have anything to do with filesystem structure. Anyway the reason you can’t cat together parts of bzip2 but you can with zstd (and gzip) is because zstd does everything in frames and everything in those frames can be decompressed separately (so you can seek and decompress parts). Bzip2 doesn’t do that.

So like, another place bzip2 sucks ass is working with large archives because you need to seek the entire archive before you can decompress it and it makes situations without parity data way more likely to cause dataloss of the whole archive. Really, don’t use it unless you have a super specific use case and know the tradeoffs, for the average person it was great when we would spend the time compressing to save the time sending over dialup.

nine_k•1h ago
Gzip is woefully old. Its only redeeming value is that it's already built into some old tools. Otherwise, use zstd, which is better and faster, both at compression and decompression. There's no reason to use gzip in anything new, except for backwards compatibility with something old.
kergonath•1h ago
> Otherwise, use zstd, which is better and faster

Yes, I do. Zstd is my preferred solution nowadays. But gzip is not going anywhere as a fallback because there is a surprisingly high number of computers without a working libzstd.

stefan_•44m ago
bzip and gzip are both horrible, terribly slow. Wherever I see "gz" or "bz" I immediately rip that nonsense out for zstd. There is such a thing as a right choice, and zstd is it every time.
laurencerowe•16m ago
lz4 can still be the right choice when decompression speed matters. It's almost twice as fast at decompression with similar compression ratios to zstd's fast setting.

https://github.com/facebook/zstd?tab=readme-ov-file#benchmar...

sedatk•22m ago
> the fact that you cannot get a valid bzip2 file by cat-ing 2 compressed files

TIL. Now that's why gzip has a file header! But, tar.gz compresses even better, that's probably why it hasn't caught on.

pocksuppet•19m ago
tar packs multiple files into one. If you concatenate two gzipped files and unzip them, you just get a concatenated file.
sedatk•17m ago
Ah okay, I thought gzip would support decompressing multiple files that way.
joecool1029•1h ago
Just use zstd unless you absolutely need to save a tiny bit more space. bzip2 and xz are extremely slow to compress.
silisili•1h ago
I'd argue it's more workload dependent, and everything is a tradeoff.

In my own testing of compressing internal generic json blobs, I found brotli a clear winner when comparing space and time.

If I want higher compatibility and fast speeds, I'd probably just reach for gzip.

zstd is good for many use cases, too, perhaps even most...but I think just telling everyone to always use it isn't necessarily the best advice.

joecool1029•57m ago
> If I want higher compatibility and fast speeds, I'd probably just reach for gzip.

It’s slower and compresses less than zstd. gzip should only be reached for as a compatibility option, that’s the only place it wins, it’s everywhere.

EDIT: If you must use it, use the modern implementation, https://www.zlib.net/pigz/

hexxagone•51m ago
In the LZ high compression regime where LZ can compete in terms of ratio, BWT compressors are faster to compress and slower to decompress than LZ codecs. BWT compressors are also more amenable to parallelization (check bsc and kanzi for modern implementations besides bzip3).
saghm•1h ago
Early on the article mentions that xz have zstd have gotten more popular than bzip, and my admitted naive understanding is that they're considered to have better tradeoffs in teems of collision compression time and overall space saved by compression. The performance section heavily discusses encoding performance of gzip and bzip, but unless I'm missing something, the only references to xz or zstd in that section are briefly handwaving about the decoding times probably being similar.

My impression is that this article has a lot of technical insight into how bzip compares to gzip, but it fails actually account for the real cause of the diminished popularity of bzip in favor of the non-gzip alternatives that it admits are the more popular choices in recent years.

hexxagone•54m ago
Notice that bzip3 has close to nothing to do with bzip2. It is a different BWT implementation with a different entropy codec, from a different author (as noted in the GitHub description "better and stronger spiritual successor to BZip2").
fl0ki•49m ago
This seems as good a thread as any to mention that the gzhttp package in klauspost/compress for Go now supports zstd on both server handles and client transports. Strangely this was added in a patch version instead of a minor version despite both expanding the API surface and changing default behavior.

https://github.com/klauspost/compress/releases/tag/v1.18.4

klauspost•14m ago
About the versioning, glad you spotted it anyway. There isn't as much use of the gzhttp package compared to the other ones, so the bar is a bit higher for that one.

Also making good progress on getting a slimmer version of zstd into the stdlib and improving the stdlib deflate.

pella•32m ago
imho: the future is a specialized compressor optimized for your specific format. ( https://openzl.org/ , ... )
srean•18m ago
That is an interesting link.

Does gmail use a special codec for storing emails ?

What happens when US economic data becomes unreliable

https://mitsloan.mit.edu/ideas-made-to-matter/what-happens-when-us-economic-data-becomes-unreliable
199•inaros•1h ago•149 comments

Sunsetting Jazzband

https://jazzband.co/news/2026/03/14/sunsetting-jazzband
36•mooreds•1h ago•5 comments

Montana passes Right to Compute act (2025)

https://www.westernmt.news/2025/04/21/montana-leads-the-nation-with-groundbreaking-right-to-compu...
182•bilsbie•4h ago•122 comments

An Ode to Bzip

https://purplesyringa.moe/blog/an-ode-to-bzip/
41•signa11•2h ago•24 comments

Baochip-1x: What it is, why I'm doing it now and how it came about

https://www.crowdsupply.com/baochip/dabao/updates/what-it-is-why-im-doing-it-now-and-how-it-came-...
220•timhh•3d ago•26 comments

The $2 per hour worker behind the OnlyFans boom

https://www.bbc.com/news/articles/cq571g9gd4lo
57•1659447091•3d ago•43 comments

NMAP in the Movies

https://nmap.org/movies/
98•homebrewer•2h ago•13 comments

Python: The Optimization Ladder

https://cemrehancavdar.com/2026/03/10/optimization-ladder/
189•Twirrim•3d ago•57 comments

Show HN: Learn Arabic with spaced repetition and comprehensible input

https://abjadpro.com
17•adangit•2h ago•7 comments

Cookie jars capture American kitsch (2023)

https://www.eater.com/23651631/cookie-jar-trend-appreciation-collecting-history
17•NaOH•1d ago•1 comments

Megadev: A Development Kit for the Sega Mega Drive and Mega CD Hardware

https://github.com/drojaazu/megadev
92•XzetaU8•9h ago•5 comments

1M context is now generally available for Opus 4.6 and Sonnet 4.6

https://claude.com/blog/1m-context-ga
1053•meetpateltech•1d ago•440 comments

Everything you never wanted to know about visually-hidden

https://dbushell.com/2026/02/20/visually-hidden/
17•PaulHoule•4d ago•5 comments

9 Mothers Defense (YC P26) Is Hiring in Austin

https://jobs.ashbyhq.com/9-mothers?utm_source=x8pZ4B3P3Q
1•ukd1•5h ago

Wired headphone sales are exploding

https://www.bbc.com/future/article/20260310-wired-headphones-are-better-than-bluetooth
358•billybuckwheat•2d ago•598 comments

In Praise of Stupid Questions

https://mathenchant.wordpress.com/2026/03/12/in-praise-of-stupid-questions/
4•ibobev•1h ago•1 comments

Show HN: GitAgent – An open standard that turns any Git repo into an AI agent

https://www.gitagent.sh/
42•sivasurend•5h ago•2 comments

Philosoph Jürgen Habermas Gestorben

https://www.spiegel.de/kultur/philosoph-juergen-habermas-mit-96-jahren-gestorben-a-8be73ac7-e722-...
116•sebastian_z•4h ago•40 comments

Online astroturfing: A problem beyond disinformation

https://journals.sagepub.com/doi/10.1177/01914537221108467
52•xyzal•3h ago•22 comments

Generalizing Knuth's Pseudocode Architecture to Knowledge

https://zenodo.org/records/18767666
5•isomorphist•3d ago•0 comments

Digg.com Closing Due to Spam

https://digg.com?hn
10•napolux•44m ago•2 comments

XML Is a Cheap DSL

https://unplannedobsolescence.com/blog/xml-cheap-dsl/
196•y1n0•6h ago•199 comments

Nominal Types in WebAssembly

https://wingolog.org/archives/2026/03/10/nominal-types-in-webassembly
29•ingve•4d ago•14 comments

Digg is gone again

https://digg.com/
351•hammerbrostime•1d ago•366 comments

RAM kits are now sold with one fake RAM stick alongside a real one

https://www.tomshardware.com/pc-components/ram/fake-ram-bundled-with-real-ram-to-create-a-perform...
198•edward•8h ago•138 comments

Can I run AI locally?

https://www.canirun.ai/
1364•ricardbejarano•1d ago•323 comments

The Isolation Trap: Erlang

https://causality.blog/essays/the-isolation-trap/
128•enz•2d ago•52 comments

I found 39 Algolia admin keys exposed across open source documentation sites

https://benzimmermann.dev/blog/algolia-docsearch-admin-keys
150•kernelrocks•20h ago•45 comments

Secure Secrets Management for Cursor Cloud Agents

https://infisical.com/blog/secure-secrets-management-for-cursor-cloud-agents
35•vmatsiiako•4d ago•6 comments

Atari 2600 BASIC Programming (2015)

https://huguesjohnson.com/programming/atari-2600-basic/
52•mondobe•3d ago•13 comments