frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

The Burrows-Wheeler Transform

https://sandbox.bio/concepts/bwt
136•g0xA52A2A•18h ago

Comments

jansan•16h ago
For an article describing a compression algorithm this was very digestible and entertaining.
jakedata•16h ago
This transformation in and of itself does not perform any compression. In fact it adds an additional string marker to the data. Performing compression on these strings is addressed here: https://www.cs.cmu.edu/~15451-f18/lectures/lec25-bwt.pdf#pag...
foobarian•16h ago
The unintuitive part of BWT to me was always the reverse operation, which was the only thing the post didn't give intuition for :-)
raboukhalil•15h ago
Good point, thanks! I'll add a subsection about the intuition for that.
raboukhalil•11h ago
I just added a section about the intuition behind the decoding: https://sandbox.bio/concepts/bwt#intuition-decoding , hope it helps!
foobarian•17m ago
Thank you! I think the circular property of that matrix is the key.
hinkley•13h ago
Most BWT descriptions: now draw the rest of the owl.

Once upon a time I found one that did the entire round trip, I neglected to bookmark it. So when I eventually forgot how it worked, or wanted to show others, I was stuck. Never did find it.

raboukhalil•11h ago
I added more details about how the decoding works here: https://sandbox.bio/concepts/bwt#intuition-decoding , I'd love to hear if that is more clear now
SimplyUnknown•5h ago
I'm still not sure I get it. I think it is

1. Put the BWT string in the right-most empty column

2. Sort the rows of the matrix such that the strings read along the columns of the matrix are in lexicographical order starting from the top-row????

3. Repeat step 1 and 2 until matrix is full

4. Extract the row of the matrix that has the end-delimiter in the final column

It's the "sort matrix" step that seems under-explained to me.

foobarian•33m ago
I think what did it for me is to recognize that the last column has characters that, for each row, come before the characters in the first column (since they are rotations). So we can view the last column as coming "before" the first one.

1. We sorted the first column to get the BWT column. Thereby we created the structure where the BWT column comes before the sorted column.

2. Therefore if we insert the BWT column before a sorted column, the order of row elements is preserved

3. If we now sort again, the order of characters across individual rows is again preserved

4. Going to step 2 again preserves row order

5. Once all columns are populated, therefore all rows are in the correct original order. And thanks to the end marker we can get the original string from any row.

fair_enough•16h ago
Thanks, man! This helps a lot, because as you say the algorithm is not intuitive. The description of the type of data it is very good for is the best value-adding part of this writeup. Greatly appreciated!
loloquwowndueo•16h ago
I was once mentioning this transform to someone and they were like “what? The Bruce Willis transform?” - I guess my pronunciation sucked that much.
Bukhmanizer•15h ago
Once in a lecture one of my friends asked “You keep mentioning Cooper Netties, who is Cooper Netties?” Everyone was very confused until someone figured out she was asking about Kubernetes.
raboukhalil•15h ago
Author here, nice to see the article posted here! I'm currently looking for ideas for other interactive articles, so let me know if think of other interesting/weird algorithms that are hard to wrap your head around.
lancefisher•15h ago
This is a good article and a fun algorithm. Google made a great series of compression videos a while ago - albeit sometimes silly. For the Burrows-Wheeler one they interviewed Mike Burrows for some of the history. https://youtu.be/4WRANhDiSHM
emmelaich•13h ago
I wonder if he came up with it when pondering permuted indexes. Which used to be a feature of the man pages system.
hinkley•13h ago
Noteworthy that the paper that was published describing this technique came at a time when IP lawyers had begun to destroy the world wrt to small business innovation. And that they released it unencumbered is a huge debt of gratitude that we haven’t really honored.
embit•13h ago
This is great. The best article I have read on BWT and experimented was in Dr Dobbs Journal around 1996-97.
pixelpoet•12h ago
I remember randomly reading about this in the Hugi demoscene diskmag as a young teen, and it completely blew my mind (notably the reverse transform, apparently not covered in OP's article): https://hugi.scene.org/online/coding/hugi%2013%20-%20cobwt.h...

The author later wrote many now-famous articles, including A Trip Through the Graphics Pipeline.

rollulus•6h ago
Oh wow, I clicked that link without reading the last section of your comment, and was like “Fabian Giesen!”, an absolute HN favourite: https://hn.algolia.com/?q=fgiesen
username223•12h ago
Does anyone else remember when Dr. Dobb's Journal wrote about stuff like this? https://web.archive.org/web/20040217202950/http://www.ddj.co...
Imnimo•12h ago
An interesting bit of trivia about the Burrows-Wheeler transform is that it was rejected when submitted to a conference for publication, and so citations just point to a DEC technical report, rather than a journal or conference article.
mfworks•11h ago
The most magical part of this transform is the search! First learned about this in a bioalgorithms course, and the really cool property is that for a string length l, you can search for the string in O(l) time. It has the same search time complexity of a suffix tree with O(n) space complexity (with a very low constant multiple). To this day it may be the coolest algorithm I've encountered.
dgacmu•10h ago
I encountered the search version of this, which is turned suffix arrays, in grad school and was so taken by them I incorporated them as the primary search mechanism for the pi searcher. 25 years later it's still the best way to do it. Incredible insight behind bwt and suffix arrays.
dcl•10h ago
A friend doing bioinformatics told me about this at uni, it was definitely one of those "i can't believe this is doable" sort of things.
kvark•9h ago
I've spent years playing with BWT and suffix sorting at school (some work can be found be names of archon and dark). It's a beautiful domain to learn!

Now I'm wondering: in the era of software 2.0, everything is figured out by AI. What are the chances AI would discover this algorithm at all?

mrkeen•7h ago
No need to speculate about this algorithm in particular. It's been a few years since every programmer has had access to LLMs (that's a huge sample size!). Some LLMs are even branded as 'Research'. Plus they never get tired like humans do. We should be swimming in algorithms like this.
zahlman•9h ago
BWT is how I first got to interact with Tim Peters on the Internet: https://stackoverflow.com/questions/21297887
moribvndvs•9h ago
For some reason I was really getting lost on step 2, “sort” it. They mean lexicographically or “sort it like each rotation is a word in the dictionary, where $ has the lowest character value”. Now that I see it, I’m a little embarrassed I got stuck on that step, but hopefully it helps someone else.
tetris11•3h ago
That trips up a lot of people, plus most examples include a '^' character at the beginning which is assigned the highest value
feanaro•1h ago
This is essentially a group theoretical result about permutation groups. Would be nice to see a treatment from this angle.

Igalia, Servo, and the Sovereign Tech Fund

https://www.igalia.com/2025/10/09/Igalia,-Servo,-and-the-Sovereign-Tech-Fund.html
109•robin_reala•2h ago•11 comments

Show HN: I invented a new generative model and got accepted to ICLR

https://discrete-distribution-networks.github.io/
197•diyer22•5h ago•23 comments

OpenGL is getting mesh shaders as well, via GL_EXT_mesh_shader

https://www.supergoodcode.com/mesh-shaders-in-the-current-year/
26•pjmlp•2h ago•3 comments

PSA: Always use a separate domain for user content

https://www.statichost.eu/blog/google-safe-browsing/
60•ericselin•57m ago•53 comments

A small number of samples can poison LLMs of any size

https://www.anthropic.com/research/small-samples-poison
1021•meetpateltech•22h ago•384 comments

Ohno Type School

https://ohnotype.co/blog/ohno-type-school-a
75•tobr•4d ago•29 comments

Htmx, Datastar, Greedy Developer

https://drshapeless.com/blog/posts/htmx,-datastar,-greedy-developer.html
33•KolmogorovComp•3h ago•7 comments

My approach to building large technical projects (2023)

https://mitchellh.com/writing/building-large-technical-projects
216•mad2021•10h ago•29 comments

Parallelizing Cellular Automata with WebGPU Compute Shaders

https://vectrx.substack.com/p/webgpu-cellular-automata
35•ibobev•5h ago•4 comments

Nobel Peace Prize 2025: María Corina Machado

https://www.nobelprize.org/prizes/peace/2025/summary/
385•pykello•5h ago•373 comments

Weave (YC W25) is hiring a founding AI engineer

https://www.ycombinator.com/companies/weave-3/jobs/SqFnIFE-founding-ai-engineer
1•adchurch•2h ago

I Switched from Htmx to Datastar

https://everydaysuperpowers.dev/articles/why-i-switched-from-htmx-to-datastar/
210•ksec•7h ago•137 comments

Bringing Desktop Linux GUIs to Android: The Next Step in Graphical App Support

https://www.linuxjournal.com/content/bringing-desktop-linux-guis-android-next-step-graphical-app-...
11•sipofwater•3h ago•6 comments

Python 3.14 is here. How fast is it?

https://blog.miguelgrinberg.com/post/python-3-14-is-here-how-fast-is-it
630•pjmlp•1d ago•453 comments

Examples Are the Best Documentation

https://rakhim.exotext.com/examples-are-the-best-documentation
257•Bogdanp•18h ago•88 comments

Static Bundle Object: Modernizing Static Linking

https://medium.com/@eyal.itkin/static-bundle-object-modernizing-static-linking-f1be36175064
18•ingve•2d ago•11 comments

My first contribution to Linux

https://vkoskiv.com/first-linux-patch/
614•vkoskiv•4d ago•72 comments

An MVCC-like columnar table on S3 with constant-time deletes

https://www.shayon.dev/post/2025/277/an-mvcc-like-columnar-table-on-s3-with-constant-time-deletes/
16•shayonj•3d ago•1 comments

A Story About Bypassing Air Canada's In-Flight Network Restrictions

https://ramsayleung.github.io/en/post/2025/a_story_about_bypassing_air_canadas_in-flight_network_...
73•samray•6h ago•58 comments

Multi-Core by Default

https://www.rfleury.com/p/multi-core-by-default
47•kruuuder•7h ago•19 comments

The Prairie Farmers Preserving the Most Threatened Ecosystem – Forever

https://reasonstobecheerful.world/prairie-farmers-preserve-most-threatened-ecosystem-forever/
6•PaulHoule•26m ago•0 comments

Figure 03, our 3rd generation humanoid robot

https://www.figure.ai/news/introducing-figure-03
370•lairv•1d ago•365 comments

Show HN: I've built a tiny hand-held keyboard

https://github.com/mafik/keyer
363•mafik•22h ago•99 comments

LLMs are mortally terrified of exceptions

https://twitter.com/karpathy/status/1976077806443569355
272•nought•21h ago•124 comments

A built-in 'off switch' to stop persistent pain

https://penntoday.upenn.edu/news/select-neurons-brainstem-may-hold-key-treating-chronic-pain
180•gmays•17h ago•80 comments

Interactive Double Pendulum Playground

https://theabbie.github.io/DoublePendulum/
54•melector•2d ago•19 comments

Subway Builder: A realistic subway simulation game

https://www.subwaybuilder.com/
271•0xbeefcab•20h ago•116 comments

Automated Lean Proofs for Every Type

https://www.galois.com/articles/automated-lean-proofs-for-every-type
21•surprisetalk•4d ago•1 comments

Origami Patterns Solve a Major Physics Riddle

https://www.quantamagazine.org/origami-patterns-solve-a-major-physics-riddle-20251006/
5•westurner•3d ago•0 comments

A beginner's guide to deploying LLMs with AMD on Windows using PyTorch

https://gpuopen.com/learn/pytorch-windows-amd-llm-guide/
92•beckford•4d ago•26 comments