I think I could take it even further by doing some kind of like betweeness centrality analysis to sort of rank the most interlinked stories within the hacker news cohort, but this was like the first cut to just demonstrate that these human intelligence annotations (the related comments) could usefully cluster the stories in a way that resulted in an interesting presentation. And I think we achieved it!
keepamovin•1h ago
My main goal was seeing HN respond to stories that develop over time.
Technically, I just scanned the 46 million HN items and aggregated clusters via these related comments. That results in a small corpus of 36500 clusters. Also doing a quick bag-o-words sanity check to ensure titles were somewhat consistent.
There’s multiple ways to sort it, so you can explore. Hope you enjoy finding some stuff here!