frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Real-time system that tracks how news spreads across 200k websites

https://yandori.io/news-flow/
83•antiochIst•4d ago
I built a system that monitors ~200,000 news RSS feeds in near real-time and clusters related articles to show how stories spread across the web.

It uses Snowflake’s Arctic model for embeddings and HNSW for fast similarity search. Each “story cluster” shows who published first, how fast it propagated, and how the narrative evolved as more outlets picked it up.

Would love feedback on the architecture, scaling approach, and any ways to make the clusters more accurate or useful.

Live demo: https://yandori.io/news-flow/

Comments

masterphai•4d ago
Interesting project - it’s rare to see news-flow tracking done in real time at this scale. One thing you may want to stress-test is how stable the clustering remains when stories evolve semantically over a few hours. Embeddings tend to drift as outlets rewrite or localize a piece, and HNSW can sometimes over-merge when the centroid shifts.

A trick that helped in a similar system I built was doing a second-pass “temporal coherence” check: if two articles are close in embedding space but far apart in publish time or share no common entities, keep them in adjacent clusters rather than forcing a merge. It reduced false positives significantly.

Also curious how you handle deduping syndicated content - AP/Reuters can dominate the embedding space unless you weight publisher identity or canonical URLs.

Overall, really nice work. The propagation timeline is especially useful.

supriyo-biswas•18m ago
Thanks for your comment, unfortunately it seems that your comments are primarily LLM-generated (for people looking for evidence, the first comments of this user should provide enough evidence, although they’re getting better by fine tuning the prompt). As HN is primarily a place for humans, please do not do this here. Thanks.
Oras•4d ago
I really like the idea. I would love a feature to add keywords and see related news.
KomoD•2h ago
I think the idea is interesting but it includes a lot of spam and non-news (e.g. archive.fo, .vn, .today, etc.)
psychoslave•2h ago
Can it be tuned to get a sense of how it reach Wikimedia projects?
hmokiguess•2h ago
Cool idea! What I liked the most was the breakdown into categories like “breaking” and “trending” plus the number of sources.

The view showing the flow with a play animation was a nice concept but I couldn’t see much value in it, wondering if you could try to get a more aggregate stats that shows a connection between these different flows, maybe they follow a pattern like ad-based campaigns or publishers who own these domains, which would explain things. Expanding on this idea, could even try and setup different scores and metrics based on major groups and sponsored content versus organic spread.

jMyles•1h ago
Just tried it, and clicking on the stories doesn't seem to do anything. Console shows "TypeError: can't access property "time", flowData[Math.min(...)] is undefined"

Ubuntu 24.04, Firefox 145.0.1 (64-bit)

guillem_lefait•1h ago
same
juujian•1h ago
Very cool. Our lab will want to do something like this eventually. Do you have a repo?
Havoc•1h ago
That's really cool!

Curious how you sourced the feeds? It seems to have a bias towards Indian/Srilanka/Iran/Indonesia/Turkey etc - i.e. not the traditional western centric reporting. Always interested in trying to get a more balanced news diet so anything you could share around that would be interesting. Most out of the box news tools seem to automatically lean west

FYI layout sometimes breaks like so:

https://i.imgur.com/FXeqB9R.png

supermatt•52m ago
“Traditional western reporting” is traditionally a western thing. That’s only 15% of the global population - so if anything it seems bias towards that.
ewuhic•1h ago
Without evaluating it thoroughly and judging just from description - I really hope this ends up open-sourced - will help drastically to many good-intent parties.
rvz•1h ago
This looks a lot like a combination of spam and slop posed as "breaking news".

> Opinion: Operation Holiday serves a critical need in our communities

> Dhru Fusion WooCommerce Integration Plugin

> Powering the Future of Wellness Through Premium Food Supplement Ingredients

That isn't even remotely important at all so really unreliable.

dmix•1h ago
How do you handle time zone issues with the dates?

I’ve been curious how much news starts from social media. So many news stories today are “someone said x on twitter”.

YmiYugy•59m ago
The idea is pretty cool, but it doesn't work super well. 1. I imagine most major news outlets don't have RSS feeds these days. 2. A lot of stuff originates from news agencies, so they don't spread from website to website, but radiate out from the agency. 3. Most of the included sources are pretty small. To draw meaningful conclusions we would need infos like popularity, political leaning, nation of origin, etc. 4. The similarity check doesn't appear to do translation. So when news spreads from one country to another we loose the thread.
badestrand•44m ago
The devil really is always in the details.
pbiggar•59m ago
See also Newscord, which does very similar work to analyze bias across news media:

- https://newscord.org/latest

- https://www.instagram.com/newscord_org

codethief•56m ago
Cool idea! On mobile (Chromium on Android) I was confused at first because nothing happened when I tapped any of the stories – until I realized I can zoom out and the info about how the story propagated is at the end of the page.
hk1337•43m ago
This seems like it could have an additional use case of labeling each news source left, right, center, neutral/factual and tracking how or if each one releases an article.
patrick4urcloud•27m ago
great !
gioele•25m ago
Kudos on releasing Yandori!

We have been (low-keep) working on something similar (more from an academic point of view) for the past few years:

This is the introductory article (open access): "Comparison of news commonality and churn in international news outlets with TARO" https://dl.acm.org/doi/abs/10.1145/3603163.3609062

(Allow me a moment of pride for the student leading this project: the paper won the Ted Nelson Award at ACM Hypertext 2023.)

jacquesm•17m ago
Is there a way you could use this system to track propaganda?
analogears•7m ago
Tried this on iPhone - the category tabs (Sports, World News, Business) get cut off on the right and there's no horizontal scroll indicator, so I didn't realise there were more options at first. The story cards also aren't using the full screen width, leaving wasted space on both sides.

Cool concept though - the source count and "+N" spread metrics give a quick sense of which stories have legs.

Show HN: Boing

https://boing.greg.technology/
494•gregsadetsky•10h ago•91 comments

Show HN: Real-time system that tracks how news spreads across 200k websites

https://yandori.io/news-flow/
84•antiochIst•4d ago•23 comments

Show HN: Mitsuki, a Python web framework as fast as Node or Java

https://github.com/DavidLandup0/mitsuki
3•DavidLandup0•1h ago•0 comments

Show HN: ReadyKit – Superfast SaaS Starter with Multi-Tenant Workspaces

https://readykit.dev/
2•level09•6h ago•0 comments

Show HN: SolveMyPainPoint – A single place to post and discover real problems

https://www.solvemypainpoint.com/
2•Chrizzby•1h ago•0 comments

Show HN: Network Monitor – a GUI to spot anomalous connections on your Linux

140•grigio•6d ago•48 comments

Show HN: Nano PDF – A CLI Tool to Edit PDFs with Gemini's Nano Banana

https://github.com/gavrielc/Nano-PDF
140•GavCo•17h ago•30 comments

Show HN: Best Black Friday Deals [Mega List]

https://www.blackfridaydeals.directory
2•bfdd•2h ago•0 comments

Show HN: Explore what the browser exposes about you

https://neberej.github.io/exposedbydefault/
245•coffeecoders•5d ago•85 comments

Show HN: MTXT – Music Text Format

https://github.com/Daninet/mtxt
4•daninet•3h ago•0 comments

Show HN: Let Claude Code call other LLMs when it runs in circles

https://github.com/raine/consult-llm-mcp
2•rane•4h ago•0 comments

Show HN: Zero-power photonic language model–code

https://zenodo.org/records/17764289
13•damir00•19h ago•5 comments

Show HN: Glasses to detect smart-glasses that have cameras

https://github.com/NullPxl/banrays
495•nullpxl•2d ago•188 comments

Show HN: No Environment Setups Anymore

https://www.gitarsenal.dev/
15•rohan2003•18h ago•12 comments

Show HN: Mu – The Micro Network

https://github.com/asim/mu
61•asim•6d ago•41 comments

Show HN: Choose your own adventure style Presentation

https://github.com/Skarlso/adventure-voter
58•skarlso•1w ago•9 comments

Show HN: An LLM-Powered Tool to Catch PCB Schematic Mistakes

https://netlist.io/
50•wafflesfreak•1d ago•28 comments

Show HN: KiDoom – Running DOOM on PCB Traces

https://www.mikeayles.com/#kidoom
358•mikeayles•4d ago•49 comments

Show HN: Pulse 2.0 – Live co-listening rooms where anyone can be a DJ

https://473999.net/pulse
81•473999•1d ago•30 comments

Show HN: I built an interactive HN Simulator

https://news.ysimulator.run/news
535•johnsillings•5d ago•214 comments

Show HN: I made a free log anonymizer in the browser

https://www.getloglens.com/tools/log-sanitizer
3•wazzaaaa•15h ago•4 comments

Show HN: ClearHearAI-The Essential App for Hearing Impaired and Deaf Communities

https://clearhearai.com/
2•justinos•15h ago•0 comments

Show HN: Rust CLI validates scientific datasets for DOE's Genesis Mission

https://github.com/clay-good/genesis-preflight
2•hireclay•16h ago•0 comments

Show HN: MacGlow – macOS app to sync brightness across Mac and all Monitors

https://www.lovi.sh/macglow
6•lovish888•23h ago•2 comments

Show HN: MkSlides – Markdown to slides with a similar workflow to MkDocs

https://github.com/MartenBE/mkslides
76•MartenBE•3d ago•15 comments

Show HN: Runprompt – run .prompt files from the command line

https://github.com/chr15m/runprompt
131•chr15m•2d ago•48 comments

Show HN: Era – Open-source local sandbox for AI agents

https://github.com/BinSquare/ERA
62•gregTurri•3d ago•18 comments

Show HN: I turned algae into a bio-altimeter and put it on a weather balloon

https://radi8.dev/blog/stratospore/
140•radeeyate•1w ago•13 comments

Show HN: SyncKit – Offline-first sync engine (Rust/WASM and TypeScript)

https://github.com/Dancode-188/synckit
87•danbitengo•2d ago•36 comments

Show HN: Spikelog – A simple metrics service for scripts, cron jobs, and MVPs

https://spikelog.com
35•dsmurrell•3d ago•17 comments