So, I made RandomCrawl. It's a super minimal website that does nothing more than run a Node script every 30 minutes, pick a random path down the file structure of the Common Crawl dataset, minor filtering for secure .com websites for good measure, and takes a random sample of 50 websites from the chunk.
There has been a ton of noise, but it has been surprisingly fun. I feel like an internet archaeologist. For every 5 random sass websites, you get like some random tourism site for a town you've never heard of, or an ancient blogspot from the early 2000s.
Here are a couple of great finds so far: https://ahapoetry.com/ https://alexunu.blogspot.com/2007/ https://www.brtpeinture.com/
I'm not sure I'll do much more with the website since it was an experiment, but you can bet I'll be digging around this dataset some more. It reminded me there is still a lot of expression out there on the internet, and its amazing some of these sites are even still live. It's way more fun to explore than to mindlessly scroll one of our five favorite websites.
disclaimer: im not filtering out nsfw so keep that in mind