Since then I’ve reworked the core of the system to explicitly model source attribution and article-to-article references, not just textual similarity.
What’s changed:
Clusters still start from full-text embeddings, but are now refined using explicit citations and source mentions inside articles.
External sources (sites we don’t actively crawl) are added as nodes when referenced, so lineage isn’t limited to the monitored set. This captures many large outlets that don’t provide usable RSS feeds.
“First to publish” no longer relies purely on RSS timestamps — publish times are validated against citation order and reference structure.
The flow view now reflects an inferred derivation graph, not just a timeline of similar headlines.
Added search and browsing across historical story archives.
Still English-only for now and not pulling from social platforms yet — trying to get attribution right before expanding.
Happy to answer questions and hear suggestions (they helped a lot already).
antiochIst•2h ago