frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Start all of your commands with a comma (2009)

https://rhodesmill.org/brandon/2009/commands-with-comma/
258•theblazehen•2d ago•86 comments

Hoot: Scheme on WebAssembly

https://www.spritely.institute/hoot/
26•AlexeyBrin•1h ago•3 comments

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

https://openciv3.org/
706•klaussilveira•15h ago•206 comments

The Waymo World Model

https://waymo.com/blog/2026/02/the-waymo-world-model-a-new-frontier-for-autonomous-driving-simula...
969•xnx•21h ago•558 comments

Vocal Guide – belt sing without killing yourself

https://jesperordrup.github.io/vocal-guide/
69•jesperordrup•6h ago•31 comments

Reinforcement Learning from Human Feedback

https://arxiv.org/abs/2504.12501
7•onurkanbkrc•48m ago•0 comments

Making geo joins faster with H3 indexes

https://floedb.ai/blog/how-we-made-geo-joins-400-faster-with-h3-indexes
135•matheusalmeida•2d ago•35 comments

Where did all the starships go?

https://www.datawrapper.de/blog/science-fiction-decline
45•speckx•4d ago•36 comments

Unseen Footage of Atari Battlezone Arcade Cabinet Production

https://arcadeblogger.com/2026/02/02/unseen-footage-of-atari-battlezone-cabinet-production/
68•videotopia•4d ago•7 comments

Welcome to the Room – A lesson in leadership by Satya Nadella

https://www.jsnover.com/blog/2026/02/01/welcome-to-the-room/
39•kaonwarb•3d ago•30 comments

Ga68, a GNU Algol 68 Compiler

https://fosdem.org/2026/schedule/event/PEXRTN-ga68-intro/
13•matt_d•3d ago•2 comments

What Is Ruliology?

https://writings.stephenwolfram.com/2026/01/what-is-ruliology/
45•helloplanets•4d ago•46 comments

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

https://github.com/valdanylchuk/breezydemo
240•isitcontent•16h ago•26 comments

Monty: A minimal, secure Python interpreter written in Rust for use by AI

https://github.com/pydantic/monty
238•dmpetrov•16h ago•127 comments

Show HN: I spent 4 years building a UI design tool with only the features I use

https://vecti.com
340•vecti•18h ago•149 comments

Hackers (1995) Animated Experience

https://hackers-1995.vercel.app/
506•todsacerdoti•23h ago•248 comments

Sheldon Brown's Bicycle Technical Info

https://www.sheldonbrown.com/
389•ostacke•22h ago•98 comments

Show HN: If you lose your memory, how to regain access to your computer?

https://eljojo.github.io/rememory/
304•eljojo•18h ago•188 comments

Microsoft open-sources LiteBox, a security-focused library OS

https://github.com/microsoft/litebox
361•aktau•22h ago•186 comments

An Update on Heroku

https://www.heroku.com/blog/an-update-on-heroku/
428•lstoll•22h ago•284 comments

Cross-Region MSK Replication: K2K vs. MirrorMaker2

https://medium.com/lensesio/cross-region-msk-replication-a-comprehensive-performance-comparison-o...
3•andmarios•4d ago•1 comments

PC Floppy Copy Protection: Vault Prolok

https://martypc.blogspot.com/2024/09/pc-floppy-copy-protection-vault-prolok.html
71•kmm•5d ago•10 comments

Was Benoit Mandelbrot a hedgehog or a fox?

https://arxiv.org/abs/2602.01122
24•bikenaga•3d ago•11 comments

Dark Alley Mathematics

https://blog.szczepan.org/blog/three-points/
96•quibono•4d ago•22 comments

The AI boom is causing shortages everywhere else

https://www.washingtonpost.com/technology/2026/02/07/ai-spending-economy-shortages/
26•1vuio0pswjnm7•2h ago•16 comments

How to effectively write quality code with AI

https://heidenstedt.org/posts/2026/how-to-effectively-write-quality-code-with-ai/
271•i5heu•18h ago•219 comments

Delimited Continuations vs. Lwt for Threads

https://mirageos.org/blog/delimcc-vs-lwt
34•romes•4d ago•3 comments

I now assume that all ads on Apple news are scams

https://kirkville.com/i-now-assume-that-all-ads-on-apple-news-are-scams/
1079•cdrnsf•1d ago•462 comments

Introducing the Developer Knowledge API and MCP Server

https://developers.googleblog.com/introducing-the-developer-knowledge-api-and-mcp-server/
64•gfortaine•13h ago•30 comments

Understanding Neural Network, Visually

https://visualrambling.space/neural-network/
306•surprisetalk•3d ago•44 comments
Open in hackernews

The Burrows-Wheeler Transform

https://sandbox.bio/concepts/bwt
157•g0xA52A2A•4mo ago

Comments

jansan•4mo ago
For an article describing a compression algorithm this was very digestible and entertaining.
jakedata•4mo ago
This transformation in and of itself does not perform any compression. In fact it adds an additional string marker to the data. Performing compression on these strings is addressed here: https://www.cs.cmu.edu/~15451-f18/lectures/lec25-bwt.pdf#pag...
foobarian•4mo ago
The unintuitive part of BWT to me was always the reverse operation, which was the only thing the post didn't give intuition for :-)
raboukhalil•4mo ago
Good point, thanks! I'll add a subsection about the intuition for that.
raboukhalil•4mo ago
I just added a section about the intuition behind the decoding: https://sandbox.bio/concepts/bwt#intuition-decoding , hope it helps!
foobarian•3mo ago
Thank you! I think the circular property of that matrix is the key.
hinkley•4mo ago
Most BWT descriptions: now draw the rest of the owl.

Once upon a time I found one that did the entire round trip, I neglected to bookmark it. So when I eventually forgot how it worked, or wanted to show others, I was stuck. Never did find it.

raboukhalil•4mo ago
I added more details about how the decoding works here: https://sandbox.bio/concepts/bwt#intuition-decoding , I'd love to hear if that is more clear now
SimplyUnknown•4mo ago
I'm still not sure I get it. I think it is

1. Put the BWT string in the right-most empty column

2. Sort the rows of the matrix such that the strings read along the columns of the matrix are in lexicographical order starting from the top-row????

3. Repeat step 1 and 2 until matrix is full

4. Extract the row of the matrix that has the end-delimiter in the final column

It's the "sort matrix" step that seems under-explained to me.

foobarian•3mo ago
I think what did it for me is to recognize that the last column has characters that, for each row, come before the characters in the first column (since they are rotations). So we can view the last column as coming "before" the first one.

1. We sorted the first column to get the BWT column. Thereby we created the structure where the BWT column comes before the sorted column.

2. Therefore if we insert the BWT column before a sorted column, the order of row elements is preserved

3. If we now sort again, the order of characters across individual rows is again preserved

4. Going to step 2 again preserves row order

5. Once all columns are populated, therefore all rows are in the correct original order. And thanks to the end marker we can get the original string from any row.

kragen•3mo ago
The algorithm given in this page for the reverse transform is not usably efficient; the algorithm given in the original paper is, and I illustrated it in my explorable explanation of it linked in my top-level comment here. It was a pretty surprising insight!
krackers•3mo ago
This always seemed magical to me too, but Gemini recently gave me an explanation that clicked.

The seemingly magical part is that it seems like you "lose" information during the transformation. After all if I give you the sorted string, it's not recoverable. But the key is that the the first and last columns of the BWT matrix end up being the lookup table, from any character to the preceding character. And of course given the BWT encoded string, you can get back the sorted string which is the first column. And with this info of which character precedes which character, it shouldn't be too magical that you can work backwards to reconstruct the original string.

Still, the really clever part is that the BWT encoded string embeds the decoding key directly into the character and positional information. And it just so happens that the way it does this captures character pair dependencies in a way that makes it a good preprocessor for "symbol at a time" encoders.

https://news.ycombinator.com/item?id=35738839 has a nice pithy way of phrasing it

>It sorts letters in a text by their [surrounding] context, so that letters with similar context appear close to each other. In a strange vaguely DFT-like way, it switches long-distance and short-distance patterns around. The result is, in a typical text, long runs of the same letter, which can be easily compressed.

foobarian•3mo ago
Yes, agreed. Presumably if you just compressed the sorted string it would compress even better, though not reversibly. So compressing the preceding column (preceding since rows are rotations) seems the next best thing
fair_enough•4mo ago
Thanks, man! This helps a lot, because as you say the algorithm is not intuitive. The description of the type of data it is very good for is the best value-adding part of this writeup. Greatly appreciated!
loloquwowndueo•4mo ago
I was once mentioning this transform to someone and they were like “what? The Bruce Willis transform?” - I guess my pronunciation sucked that much.
Bukhmanizer•4mo ago
Once in a lecture one of my friends asked “You keep mentioning Cooper Netties, who is Cooper Netties?” Everyone was very confused until someone figured out she was asking about Kubernetes.
raboukhalil•4mo ago
Author here, nice to see the article posted here! I'm currently looking for ideas for other interactive articles, so let me know if think of other interesting/weird algorithms that are hard to wrap your head around.
lancefisher•4mo ago
This is a good article and a fun algorithm. Google made a great series of compression videos a while ago - albeit sometimes silly. For the Burrows-Wheeler one they interviewed Mike Burrows for some of the history. https://youtu.be/4WRANhDiSHM
emmelaich•4mo ago
I wonder if he came up with it when pondering permuted indexes. Which used to be a feature of the man pages system.
hinkley•4mo ago
Noteworthy that the paper that was published describing this technique came at a time when IP lawyers had begun to destroy the world wrt to small business innovation. And that they released it unencumbered is a huge debt of gratitude that we haven’t really honored.
embit•4mo ago
This is great. The best article I have read on BWT and experimented was in Dr Dobbs Journal around 1996-97.
pixelpoet•4mo ago
I remember randomly reading about this in the Hugi demoscene diskmag as a young teen, and it completely blew my mind (notably the reverse transform, apparently not covered in OP's article): https://hugi.scene.org/online/coding/hugi%2013%20-%20cobwt.h...

The author later wrote many now-famous articles, including A Trip Through the Graphics Pipeline.

rollulus•4mo ago
Oh wow, I clicked that link without reading the last section of your comment, and was like “Fabian Giesen!”, an absolute HN favourite: https://hn.algolia.com/?q=fgiesen
sequin•3mo ago
Same; any time I see BWT I think of Fabian Giesen. I think he was only 16 or 17 when he wrote that article.

By chance I once sat near him and the rest of Farbrausch at a demoparty, but I was too shy to say hi.

username223•4mo ago
Does anyone else remember when Dr. Dobb's Journal wrote about stuff like this? https://web.archive.org/web/20040217202950/http://www.ddj.co...
kragen•3mo ago
Nice find! The author's page for it was at https://web.archive.org/web/20210828042410/https://marknelso....
Imnimo•4mo ago
An interesting bit of trivia about the Burrows-Wheeler transform is that it was rejected when submitted to a conference for publication, and so citations just point to a DEC technical report, rather than a journal or conference article.
Qem•3mo ago
What were the reasons given, if any? Are they publicly known?
Imnimo•3mo ago
I only know the story from word-of-mouth. My understanding is that the submitted paper wasn't written/presented well, and reviewers had trouble understanding the significance of what was being proposed. But take that story with a big grain of salt.
mfworks•4mo ago
The most magical part of this transform is the search! First learned about this in a bioalgorithms course, and the really cool property is that for a string length l, you can search for the string in O(l) time. It has the same search time complexity of a suffix tree with O(n) space complexity (with a very low constant multiple). To this day it may be the coolest algorithm I've encountered.
dgacmu•4mo ago
I encountered the search version of this, which is turned suffix arrays, in grad school and was so taken by them I incorporated them as the primary search mechanism for the pi searcher. 25 years later it's still the best way to do it. Incredible insight behind bwt and suffix arrays.
dcl•4mo ago
A friend doing bioinformatics told me about this at uni, it was definitely one of those "i can't believe this is doable" sort of things.
kvark•4mo ago
I've spent years playing with BWT and suffix sorting at school (some work can be found be names of archon and dark). It's a beautiful domain to learn!

Now I'm wondering: in the era of software 2.0, everything is figured out by AI. What are the chances AI would discover this algorithm at all?

mrkeen•4mo ago
No need to speculate about this algorithm in particular. It's been a few years since every programmer has had access to LLMs (that's a huge sample size!). Some LLMs are even branded as 'Research'. Plus they never get tired like humans do. We should be swimming in algorithms like this.
zahlman•4mo ago
BWT is how I first got to interact with Tim Peters on the Internet: https://stackoverflow.com/questions/21297887
moribvndvs•4mo ago
For some reason I was really getting lost on step 2, “sort” it. They mean lexicographically or “sort it like each rotation is a word in the dictionary, where $ has the lowest character value”. Now that I see it, I’m a little embarrassed I got stuck on that step, but hopefully it helps someone else.
tetris11•4mo ago
That trips up a lot of people, plus most examples include a '^' character at the beginning which is assigned the highest value
feanaro•4mo ago
This is essentially a group theoretical result about permutation groups. Would be nice to see a treatment from this angle.
kragen•3mo ago
This page looks very nice indeed!

This doesn't follow the original paper very closely, although I think the device of putting an end-of-string marker into the data instead of including the start index as a separate datum is an improvement over the original algorithm. My own "explorable explanation" of the algorithm, at http://canonical.org/~kragen/sw/bwt, is visually uglier, and is very similar to this page in many ways, but it may still be of some interest: it follows the original paper more closely, which is likely to be useful if you're trying to understand the paper. Also, I used an inverse-transform algorithm that was efficient enough that you could imagine actually using in practice, while still being easy to explain (from the original paper)—but, like the page linked, I didn't use an efficient suffix-array-construction algorithm to do the forward transform. I did link to the three that have been discovered.

I forget when I did this but apparently sometime around 02010: https://web.archive.org/web/20100616075622/http://canonical....

I think my JS code in that page is a lot more readable than the minified app.js served up with this page.

DmitryOlshansky•3mo ago
And I’ve just implemented BWT and Inverse BWT in D, earlier today! https://github.com/DmitryOlshansky/compd/blob/master/source/...