So I rebuilt the ranking around a different signal: explicit “related:” style cross-links inside HN comments plus the broader HN backlink graph (when discussion A links to discussion B). That graph is the core signal. From there, I score pairings by evidence strength and then favor links with lower lexical overlap, so we surface semantically connected stories that don’t look identical in title wording.
What I learned: big threads are often strong on attention, but weaker on novelty. Backlink + “related:” intent is a better proxy for meaningful cross-thread connection. If you want standard topic clustering, Algolia is perfect; this is for high-entropy pairings discovered from HN’s own linking behavior.
Check it out: