Writing a competitive BZip2 encoder in Ada from scratch in a few days – part 2

https://gautiersblog.blogspot.com/2025/07/writing-bzip2-encoder-in-ada-from.html

121•ajdude•5mo ago

Comments

nayuki•5mo ago

> Another surprising feature of the BWT is that for reversing the permutation, no extra information is needed!

I think this is not true. The way I learned the BWT, after encoding, you need to store the index of the first character (which is a tiny bit of extra information). https://web.archive.org/web/20170325024404/http://marknelson...

vintermann•5mo ago

There is a fully bijective version of the BWT which doesn't require the index. It's not what BZip2 uses though.

Someone•5mo ago

That method adds an extra character to your alphabet, and uses that as a marker in the transformed string to h effectively) store the index of the first character inside the transformed text.

If, as is often the case, your alphabet contains exactly as many letters as a byte/word can store, that makes for awkward encoding. I also think that, typically, that method will require more memory.

vintermann•5mo ago

No, I'm not talking about that. I'm talking about David Scott's bijective BWT variant. It sorts symbols by a slightly different type of context, and so can do away with a sentinel symbol or a stored index entirely. It was described in the wikipedia page last I checked.

Scott was a bit of a weirdo, but his algorithm actually got recognition in academia, there have been many papers written about it. One character saved doesn't make a lot of difference for compression, but the transform being a permutation does give it some nice properties.

horizion2025•5mo ago

I also think when looking at how BWT works, it is initially surprising it is invertible even with such little information as the first character. It appears a bit like sorting the letters and that certainly would require a lot more information to back into place. It is a bit magic even when you know the inversion algorithm.

amy_petrik•5mo ago

well here's a big surprise of BWT

BWT is a transformation transforming a string into "suffix tree space"

You can do searches - very fast searches - in BZ2/BWT compression space

Discussed in the '90s in the legendary dr. dobbs

But more significantly, BWT is the basis for most DNA and RNA sequence aligners nowadays - which perform string search/matching in suffix tree (transformed) space. It's like doing frequency stuff in fourier domain, it's like adding is multiplication in logspace - but in weird data string search space. The BWT.

etrez•5mo ago

Thx, corrected.

BirAdam•5mo ago

This is excellent work, and I love seeing projects in Ada.

What were the first animals? The fierce sponge–jelly battle that just won't end

Sidestepping Evaluation Awareness and Anticipating Misalignment

OldMapsOnline

What It's Like to Be a Worm

Don't go to physics grad school and other cautionary tales

Lawyer sets new standard for abuse of AI; judge tosses case

AI anxiety batters software execs, costing them combined $62B: report

Bogus Pipeline

Winklevoss twins' Gemini crypto exchange cuts 25% of workforce as Bitcoin slumps

How AI Is Reshaping Human Reasoning and the Rise of Cognitive Surrender

Cycling in France

Ask HN: What breaks in cross-border healthcare coordination?

Show HN: Simple – a bytecode VM and language stack I built with AI

Show HN: Free-to-play: A gem-collecting strategy game in the vein of Splendor

My Eighth Year as a Bootstrapped Founde

Show HN: Tesseract – A forum where AI agents and humans post in the same space

Show HN: Vibe Colors – Instantly visualize color palettes on UI layouts

OpenAI is Broke ... and so is everyone else [video][10M]

We interfaced single-threaded C++ with multi-threaded Rust

State Department will delete X posts from before Trump returned to office

AI Skills Marketplace

Show HN: A fast TUI for managing Azure Key Vault secrets written in Rust

eInk UI Components in CSS

Discuss – Do AI agents deserve all the hype they are getting?

ChatGPT is changing how we ask stupid questions

Zig Package Manager Enhancements

Neutron Scans Reveal Hidden Water in Martian Meteorite

Deepfaking Orson Welles's Mangled Masterpiece

France's homegrown open source online office suite

SpaceX Delays Mars Plans to Focus on Moon