frontpage.
newsnewestaskshowjobs

Open Source @Github

fp.

Open in hackernews

Show HN: Infini-News – 1.36B news articles from Common Crawl, queryable in ms

https://cs2.uni-graz.at/blog/infini-news/
4•ruggsea•1h ago
Infini-News is ten years of CC-NEWS (the news subset of Common Crawl), cleaned, enriched and turned into a full-text index so you can count any keyword or phrase across 1.36B articles in sub-second time (ok, now maybe a few seconds, but circumstantial), without downloading anything. It's free and open on Hugging Face. I did it because I was sick of having to manually scrape news websites and the like for research purposes and because it felt interesting personally to tackle a project of this scale. On top of data cleaning, we have run language, country (via TLDs and some other heuristics) and topic tagging over all the articles and I have indexed all of them using a recent new n-gram indexing technology that I consider akin to magic. I would encourage you to read the blogpost and play with the interactive viz I made for it. Also, of course, happy to answer questions. Blog: https://cs2.uni-graz.at/blog/infini-news/ Dataset: https://huggingface.co/datasets/ruggsea/infini-news-corpus Index: https://huggingface.co/datasets/ruggsea/infini-news-index Preprint: https://arxiv.org/abs/2605.18337

Comments

wonnie•1h ago
Very cool! Happy to see some cool stuff made in Graz too. Keep up the good work!

Monlite: The complete back end for AI agents – in one file

https://github.com/qataruts/monlite
2•emadjumaah•1m ago•0 comments

Meta looks to turn excess AI compute into cash

https://techcrunch.com/2026/07/01/meta-like-spacex-looks-to-turn-excess-ai-compute-into-cash/
2•bogdiyan•1m ago•0 comments

Show HN: Pinch-to-zoom tree navigation

https://www.delopsu.com/pinch-to-zoom-tree-navigation
2•delopsu•2m ago•0 comments

Mageia 10 keeps the 32-bit Linux flame alive

https://www.theregister.com/os-platforms/2026/06/29/mageia-10-keeps-the-32-bit-linux-flame-alive/...
1•Qem•2m ago•0 comments

FFmpeg 9.1's new AAC encoder

https://news.ycombinator.com/
1•ledoge•2m ago•0 comments

Prevented Mortality and Greenhouse Gas Emissions from Nuclear Power [pdf]

https://www.giss.nasa.gov/pubs/docs/2013/2013_Kharecha_kh05000e.pdf
1•rbanffy•4m ago•0 comments

Show HN: Osiris JSON generate private infrastructure snapshot without AI or SaaS

https://github.com/osirisjson/osiris-producers
1•skhell•5m ago•0 comments

Show HN: Loma – a self-hosted shared AI layer for your whole company

https://github.com/plotlinelabs/loma
1•tadarsh•5m ago•0 comments

This Cell Feeds, Grows and Reproduces. and It's Manmade

https://www.nytimes.com/interactive/2026/07/01/science/spudcells-synthetic-cell.html
1•quux•6m ago•0 comments

Cory Doctorow: There are reasons to be optimistic about the AI bubble bursting [video]

https://www.youtube.com/watch?v=r03DPWGIxfY
2•dgellow•7m ago•0 comments

Creator Left Furious After Man Uses AI to Turn Her Book Idea into Content

https://thenerdstash.com/colorado-creator-left-furious-after-man-uses-ai-to-turn-her-book-idea-in...
3•dentemple•9m ago•0 comments

Discovering Concept-Editing Algorithms with LLM Agents

https://dmodel.ai/concept-erasure/
2•mattmarcus•10m ago•0 comments

Despite its best efforts, Iran won't be able to toll the Strait of Hormuz

https://theconversation.com/despite-its-best-efforts-iran-wont-be-able-to-toll-the-strait-of-horm...
2•thisislife2•10m ago•0 comments

The C to Rust migration book

https://mainmatter.com/c-to-rust-migration-book/
2•LukeMathWalker•12m ago•0 comments

I Like Small Keyboards

https://samsm.ch/small-keyboards/
1•surprisetalk•12m ago•0 comments

Your Site, Your Rules

https://blog.cloudflare.com/content-independence-day-ai-options/
2•soheilpro•12m ago•0 comments

Trust your compiler: Modern C++

https://categorica.io/blog/2026.06.29_trust_your_compiler/
2•foxhill•12m ago•0 comments

Artist Corporations Became Law

https://www.ystrickler.com/how-artist-corporations-became-law-2/
2•inchevd•13m ago•0 comments

Gemini is better than search because Google enshittified search

https://pluralistic.net/2026/06/29/arsonist-firefighters/
2•hn_acker•13m ago•0 comments

Why Weather Forecasts Have Seemed So Inaccurate Lately

https://gizmodo.com/heres-why-weather-forecasts-have-seemed-so-inaccurate-lately-2000779436
1•pulisse•13m ago•0 comments

Using network namespaces to discover how Claude Code scrapes

https://patrickmccanna.net/inspecting-claude-codes-network-traffic-with-linux-namespaces-and-mitm...
1•0o_MrPatrick_o0•13m ago•0 comments

Monetization Gateway

https://blog.cloudflare.com/monetization-gateway/
1•soheilpro•14m ago•0 comments

Show HN: Onda, an internet radio TUI with stream quality selection

https://github.com/pedrosousa13/onda
1•pedrosousa•14m ago•1 comments

chDB-WASM: complete ClickHouse OLAP engine, compiled to WebAssembly

https://twitter.com/chdb_io/status/2072230596227797361
1•tosh•15m ago•0 comments

Thomas Jefferson's descendant on his family's complex legacy

https://www.theguardian.com/us-news/2026/jul/01/thomas-jefferson-great-grandson-family-legacy
1•andsoitis•17m ago•0 comments

PlayStation is ending production of physical game discs for all new games

https://www.neowin.net/news/playstation-ditching-disc-games-entirely-calling-it-a-natural-directi...
1•bundie•17m ago•0 comments

Chainything: Workflow automation tool with no-code UI and AI assistant

https://github.com/Bessouat40/chainything
1•labess40•17m ago•0 comments

Claude Helped a Hacker Find a Way to Issue Tickets to US Music Festivals

https://www.wired.com/story/claude-helped-a-hacker-find-a-way-to-issue-tickets-to-almost-every-us...
3•_tk_•18m ago•0 comments

Auditable Workspaces for AI Coding Agents

https://medium.com/@Koukyosyumei/auditable-workspaces-for-ai-coding-agents-de00eff5f9b9
1•syumei•18m ago•0 comments

Your Kids' School Bus Is About to Become a Roaming Surveillance Vehicle

https://www.thedrive.com/news/your-kids-school-bus-is-about-to-become-a-roaming-surveillance-vehicle
4•cf100clunk•18m ago•0 comments