frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Qordinate – AI that talks for you

https://www.qordinate.ai/
1•vismit2000•2m ago•0 comments

A History of CSS

https://modern-css.com/history-of-css/
2•naeemnur•2m ago•0 comments

Upside Robotics is reducing fertilizer use and waste in corn crops

https://techcrunch.com/2026/02/11/upside-robotics-is-reducing-fertilizer-use-and-waste-in-corn-cr...
1•PaulHoule•2m ago•0 comments

How Google Is Killing Independent Sites Like Ours

https://housefresh.com/david-vs-digital-goliaths/
1•TigerUniversity•2m ago•0 comments

Whisper Anywhere – ChatGPT-level dictation in every Mac app

https://whisperanywhere.app
1•vismit2000•3m ago•0 comments

Life on Peptides Feels Amazing

https://nymag.com/intelligencer/article/peptides-from-instagram-china-wellness-cure.html
1•hippich•4m ago•0 comments

Study finds AI chose nuclear signalling in 95% of simulated crises

https://www.kcl.ac.uk/news/artificial-intelligence-under-nuclear-pressure-first-large-scale-kings...
1•geox•4m ago•0 comments

Show HN: Retrievo – In-memory hybrid search for .NET AI agents

https://github.com/TianqiZhang/Retrievo
1•ztq121121•7m ago•0 comments

Show HN: An IntelliJ plugin to test MyBatis dynamic SQL

1•allegorist•8m ago•0 comments

Show HN: Go-TUI – a framework for building declarative terminal UIs in Go

https://www.go-tui.dev/
1•grindlemire•10m ago•0 comments

Polar Factor Beyond Newton-Schulz – Fast Matrix Inverse Square Root

https://jiha-kim.github.io/posts/polar-factor-beyond-newton-schulz-fast-matrix-inverse-square-root/
1•ibobev•10m ago•0 comments

Show HN: MRR Take-Home Calculator for Bootstrapped Founders

https://requiredmrr.com/us/mobile-app/2.5k
1•totaa•11m ago•0 comments

Popular prayer program becomes propaganda pusher after reported Israeli hack

https://www.theregister.com/2026/03/02/iran_prayer_app_propaganda_hack_israel/
1•namblooc•11m ago•0 comments

Accessing inactive union members through char

https://www.sandordargo.com/blog/2026/03/04/char-representation-and-UB
1•ibobev•11m ago•0 comments

Biggest French Tracker YGG shuts down by hacker that leaked database

https://yggleak.top/fr/home/ygg-dossier
1•saylisteins•12m ago•0 comments

Bootleg Windows Office scheme crashes triggers 22-month lockup for Florida woman

https://www.theregister.com/2026/03/03/windows_office_software_scalper/
2•Bender•13m ago•0 comments

E-Invoice – Simple Mobile Invoicing for Freelancers (iOS)

https://apps.apple.com/us/app/e-invoice-mobile-invoicing/id1588261617
1•awsnovabot•13m ago•0 comments

Remote Firmware Injection in Popular Solar Inverters

https://jakkaru.de/articles/apsystems-remote-firmware-injection
2•mrlnstk•15m ago•0 comments

Beginner's guide to the Amiga E language

https://blog.mikaellundin.name/2016/02/18/beginners-guide-to-amiga-e.html
2•harel•17m ago•0 comments

HN – Browse Hacker News from the Terminal (CLI and TUI)

https://github.com/Aayush9029/hn
1•aayush9029•17m ago•0 comments

China's Initiative to Regulate Anthropomorphic AI

https://www.kwm.com/cn/en/insights/latest-thinking/chinas-Initiative-to-regulate-anthropomorphic-...
2•shinryuu•17m ago•0 comments

US Supreme Court's Republicans seized most dangerous power in constitutional law

https://www.vox.com/politics/481401/supreme-court-mirabelli-bonta-sauron-wins
2•robtherobber•18m ago•0 comments

Show HN: MeshCore SAR – Voice, Maps, and Messaging Without Cell Coverage

https://github.com/dz0ny/meshcore-sar
1•dz0ny•19m ago•0 comments

Show HN: Skill for structured deep research with Claude Code and Obsidian

https://github.com/alvdef/obsidian-deep-research
1•alvdef•19m ago•1 comments

First Known Mass iOS Attack

https://iverify.io/press-releases/first-known-mass-ios-attack
2•louismerlin•19m ago•0 comments

EU proposes "Made in EU" rules for strategic sectors to limit China reliance

https://www.reuters.com/world/china/eu-lay-out-local-content-rules-strengthen-manufacturing-cut-c...
1•alephnerd•20m ago•0 comments

Amiga C Tutorial

http://www.pjhutchison.org/tutorial/amiga_c.html
2•harel•20m ago•0 comments

Running Llama Inference on Intel Itanium

https://medium.com/@tglozar/running-llama-inference-on-intel-itanium-part-2-045d663fe5d4
3•RobotToaster•20m ago•0 comments

Mdenc – Diff-friendly Markdown encryption for Git

https://github.com/yogh-io/mdenc
1•yoghurt114•20m ago•1 comments

How to protect your privacy at a protest

https://proton.me/blog/how-to-protect-privacy-at-protests
2•Vinnl•21m ago•0 comments
Open in hackernews

Cross-Lingual News Dedup at $100/Month – Embeddings, Pgvector, and UnionFind

https://yingjiezhao.com/en/articles/Cross-Lingual-News-Dedup-at-100-Dollar-a-Month/
2•ethan_zhao•1h ago

Comments

ethan_zhao•1h ago
Author here. I built this for 3mins.news, an AI news aggregator covering 180+ sources in 17 languages. The trickiest part was figuring out that articles in different languages about the same event share zero tokens — MinHash/LSH gives you Jaccard similarity of 0.

Happy to answer questions about the pgvector setup, Cloudflare Workers constraints, or the clustering algorithm tuning.

yugoru•1h ago
its harder than it first appears. Even with good embeddings, semantic similarity across languages often breaks when articles include local context or idioms. Curious whether you found a threshold strategy that works reliably across languages, or if it still needs manual tuning.
ethan_zhao•58m ago
Good question. The short answer: a single global threshold (cosine similarity ≥ 0.7) works surprisingly well for news, but it's not because embeddings handle idioms perfectly — it's because news articles are structurally constrained.

News articles about the same event tend to share named entities (people, places, organizations), numbers, and factual structure even across languages. "EU approves AI regulation" is a factual statement that embeds similarly regardless of language. This is very different from, say, opinion pieces or cultural commentary where idioms and local framing would diverge more.

That said, similarity alone isn't enough. The real reliability comes from non-semantic constraints layered on top:

- Time gap ≤ 18 hours between article and story — prevents "same topic, different month" false merges

- Story age ≤ 36 hours — old stories stop absorbing new articles

- Two-pass design — matching against refined story embeddings (average of recent articles) is more stable than raw article-to-article comparison

Where it does break: regional stories with heavy local context. A Japanese domestic politics article and an English wire service summary of the same event sometimes land just below threshold because the framing is so different. I accept some missed merges there rather than lowering the threshold and getting false positives.

No per-language thresholds so far — the embedding model (Qwen3) seems to normalize well across the languages I cover. But I wouldn't be surprised if that changes when adding languages with less training data representation.