It's keyed fairly simply: mine HN all time data for "related:" type comments on stories that link to other stories, then do a title-v-title bag-o-words cosine similarity and rank on descending least similar, with an overall sort on some other metrics across all pairings.
the point is to surface the "high signal" related comment edges. i started out on a few iterations of this idea surfacing the obvious "high overlap" cosine similarity titles. but that wasn't popular. i considered why - most rational reason is that it's most expected, least interesting, lowest entropy. so i inverted. hence the name. enjoy
keepamovin•1h ago
the point is to surface the "high signal" related comment edges. i started out on a few iterations of this idea surfacing the obvious "high overlap" cosine similarity titles. but that wasn't popular. i considered why - most rational reason is that it's most expected, least interesting, lowest entropy. so i inverted. hence the name. enjoy