I’m the author of this post. Happy to answer any questions, and love to get feedback.
The code for all of my posts can be found at https://github.com/samwho/visualisations and is MIT licensed, so you’re welcome to use it :)
The dogs on the playing cards were commissioned just for this post. They’re all made by the wonderful https://www.andycarolan.com/.
The colour palette is the Wong palette that I learned about from https://davidmathlogic.com/colorblind/.
Oh, and you can pet the dogs. :)
Thank you for using a colour-blind friendly palette; as someone with deuteranopia :)
One thing that threw me for a bit is when it switched from the intro of picking 3 cards at random from a deck of 10 or 436,234 to picking just one card. It's seems as if it almost needs a section heading before "Now let me throw you a curveball: what if I were to show you 1 card at a time, and you had to pick 1 at random?" indicating that now we're switching to a simplifying assumption that we're holding only 1 card not 3, but we also don't know the size of the deck.
An advanced extension to this is that there are algorithms which calculate the number of records to skip rather than doing a trial per record. This has a good write-up of them: https://richardstartin.github.io/posts/reservoir-sampling
He did this by hiking a fixed route, and at fixed intervals scare the birds so they would fly and count.
The total count was submitted to some office which used it to estimate the population.
One year he had to travel abroad when the counting had to be done, so he recruited a friend and explained in detail how to do it.
However when the day of the counting arrived his friend forgot, and it was a huge hassle anyway so he just submitted a number he figured was about right, and that was that.
Then one day the following year, the local newspaper had a frontpage headline stating "record increase in ptarmigan population".
The reason it was big news was that the population estimate was used to set the hunting quotas, something his friend had not considered...
wood_spirit•2h ago
owyn•2h ago
I like the visualizations in this article, really good explanation.
dekhn•1h ago
wood_spirit•55m ago
I knew it from before my interview from a turbo pascal program I had seen that sampled dat tape backups of patient records from a hospital system. These samples were used for studies. That was a textbook example of it’s utility.
dekhn•32m ago