frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Al Lowe on model trains, funny deaths and working with Disney

https://spillhistorie.no/2026/02/06/interview-with-sierra-veteran-al-lowe/
38•thelok•2h ago•3 comments

Hoot: Scheme on WebAssembly

https://www.spritely.institute/hoot/
101•AlexeyBrin•6h ago•18 comments

First Proof

https://arxiv.org/abs/2602.05192
51•samasblack•3h ago•38 comments

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

https://openciv3.org/
789•klaussilveira•20h ago•243 comments

Stories from 25 Years of Software Development

https://susam.net/twenty-five-years-of-computing.html
38•vinhnx•3h ago•5 comments

Reinforcement Learning from Human Feedback

https://rlhfbook.com/
62•onurkanbkrc•5h ago•5 comments

The Waymo World Model

https://waymo.com/blog/2026/02/the-waymo-world-model-a-new-frontier-for-autonomous-driving-simula...
1040•xnx•1d ago•587 comments

Start all of your commands with a comma (2009)

https://rhodesmill.org/brandon/2009/commands-with-comma/
462•theblazehen•2d ago•165 comments

France's homegrown open source online office suite

https://github.com/suitenumerique
506•nar001•4h ago•235 comments

Vocal Guide – belt sing without killing yourself

https://jesperordrup.github.io/vocal-guide/
183•jesperordrup•10h ago•65 comments

The AI boom is causing shortages everywhere else

https://www.washingtonpost.com/technology/2026/02/07/ai-spending-economy-shortages/
63•1vuio0pswjnm7•7h ago•59 comments

Software factories and the agentic moment

https://factory.strongdm.ai/
48•mellosouls•3h ago•50 comments

Coding agents have replaced every framework I used

https://blog.alaindichiappari.dev/p/software-engineering-is-back
186•alainrk•5h ago•280 comments

A Fresh Look at IBM 3270 Information Display System

https://www.rs-online.com/designspark/a-fresh-look-at-ibm-3270-information-display-system
27•rbanffy•4d ago•5 comments

What Is Stoicism?

https://stoacentral.com/guides/what-is-stoicism
16•0xmattf•2h ago•7 comments

72M Points of Interest

https://tech.marksblogg.com/overture-places-pois.html
19•marklit•5d ago•0 comments

Unseen Footage of Atari Battlezone Arcade Cabinet Production

https://arcadeblogger.com/2026/02/02/unseen-footage-of-atari-battlezone-cabinet-production/
108•videotopia•4d ago•27 comments

Where did all the starships go?

https://www.datawrapper.de/blog/science-fiction-decline
58•speckx•4d ago•62 comments

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

https://github.com/valdanylchuk/breezydemo
268•isitcontent•20h ago•34 comments

Learning from context is harder than we thought

https://hy.tencent.com/research/100025?langVersion=en
197•limoce•4d ago•107 comments

Monty: A minimal, secure Python interpreter written in Rust for use by AI

https://github.com/pydantic/monty
281•dmpetrov•21h ago•150 comments

British drivers over 70 to face eye tests every three years

https://www.bbc.com/news/articles/c205nxy0p31o
169•bookofjoe•2h ago•152 comments

Making geo joins faster with H3 indexes

https://floedb.ai/blog/how-we-made-geo-joins-400-faster-with-h3-indexes
152•matheusalmeida•2d ago•47 comments

Hackers (1995) Animated Experience

https://hackers-1995.vercel.app/
549•todsacerdoti•1d ago•266 comments

Sheldon Brown's Bicycle Technical Info

https://www.sheldonbrown.com/
422•ostacke•1d ago•110 comments

Ga68, a GNU Algol 68 Compiler

https://fosdem.org/2026/schedule/event/PEXRTN-ga68-intro/
39•matt_d•4d ago•13 comments

Show HN: I spent 4 years building a UI design tool with only the features I use

https://vecti.com
365•vecti•23h ago•167 comments

An Update on Heroku

https://www.heroku.com/blog/an-update-on-heroku/
465•lstoll•1d ago•305 comments

Show HN: If you lose your memory, how to regain access to your computer?

https://eljojo.github.io/rememory/
341•eljojo•23h ago•209 comments

What Is Ruliology?

https://writings.stephenwolfram.com/2026/01/what-is-ruliology/
66•helloplanets•4d ago•70 comments
Open in hackernews

Inverted Indexes: A Step-by-Step Implementation Guide (2023)

https://www.chashnikov.dev/post/inverted-indexes-a-step-by-step-implementation-guide
85•klaussilveira•6mo ago

Comments

the_precipitate•6mo ago
To really appreciate inverted indexes, it’s worthwhile to study ISR (Inverted Stream Readers), a concept introduced by the great Mike Burrows. It’s also worth exploring encoding techniques like PForDelta. These elegant ideas demonstrate how true systems design masters can distill complex concepts into simple, powerful abstractions.

Edit: I stand corrected: it's called index stream readers (thanks atombender for pointing this out). For those who knows Mike Burrows only for the Burrows-Wheeler transformation (BZip), you might also want to know that he was also one of the main developers of AltaVista, the first real search engine for the internet. He also designed the early versions of Bing search engine. Eventually he worked for Google and designed their lock service called Chubby.

atombender•6mo ago
I think you're thinking of index stream readers?
mrkeen•6mo ago
I have heard of neither. But the mention of Burrows leads me to Burrows-Wheeler, which is a compression algorithm (bzip).

I'm not 100% but I don't think you can directly query a BWT in the same way you'd query an inverted index (without the later discovery of wavelet trees and FM-indexes / succinct data structures, and all that jazz.) And that's mostly for genomics? Not sure if it applies to plain old document searches. Would love to be corrected though.

lazamar•6mo ago
At Meta they are using FM indexes to power text search through the entire commit history of their monorepo.
marginalia_nu•6mo ago
Another very nice algorithm in the space is this one[1] for intersecting postings lists in sublinear time generally with very good cache characteristics to boot. Works with tree-based indexes as well as skip lists (though a more modern design might also use simple bloom filters to go with the skip pointers).

[1] https://nlp.stanford.edu/IR-book/html/htmledition/faster-pos...

SeanSullivan86•6mo ago
I've sometimes been confused by the term "inverted index". The example in this post feels like what I would just call an "index"... i.e documents indexed by the words they contain. Feels about the same as the index in the back of a physical book.

Is the distinction that an index on a multi-valued attribute is called an inverted index?

mrkeen•6mo ago
No it's the same thing. With any book you have built-in mechanism to go to a page number see what words are there. An inverted index lets you do the inverse (words -> page numbers).
SeanSullivan86•6mo ago
People (non-tech) don't tend to refer to "go to page 106" as using an index. The pages at the back of the book providing the word->page numbers lookup are commonly known as the book's "index"
grg0•6mo ago
"commonly" is an understatement; that's literally what a book index is by definition.

The only thing "inverted" here is the context. The author even admits themselves that the word->doc mapping is an index:

"If user wants to search by words - then words should be keys in our "database" (index)"

It's a pointless debate of semantics. An inverted map is still a map.

teiferer•6mo ago
It's pointless in the sense that the word "inverse" in the term is pointless, a mild way of saying that it's confusing or even unnecessary to the point of being incorrect.

The discussion about it is not pointless since it clears up confusion. It might not have been for you, but it's clearly for many others, so if you think that's pointless then allowing yourself to appreciate other perspectives could go a long way.

An inverted map is still a map, but if you are typically thinking of the map A->B and then suddenly somebody talks about an inverted map, then it's understandable that people start to assume that this is now about B->A and get confused if it somehow actually isn't really.

valiant55•6mo ago
If the documents where themselves stored in a database they have and id and the contents. The clustering key (an index) would be on the id. It's inverted because the contents are deconstructed into tokens with a list of ids that contain that token. Now the contents (tokens) server as the indexed value.
atombender•6mo ago
Inverted indexes are what databases call indexes. It's used in the IR field to differentiate from forward indexes, which are less common, so you're right that we could just say "index's.

But when we talk about inverted indexes, they are almost always term -> posting list, and most index data structures lay these out so that posting lists are sorted and compressed together. Traditional database indexes like B-trees are optimized for rapid insertion and deletion, while inverted indexes tend to be optimized for batch processing, because you typically deconstruct text into words for a large batch and then lazily integrate this batch into the main index.

Part of this is about scale; a row in a database typically has a single column or maybe 2-3 columns in a composite index; but a document text may tokenize into thousands, hundreds of thousands, or millions of words. At this scale, the fine-grained nature of words mean B-trees aren't as a good a fit.

Another part of it is that inverted indexes aren't for point queries, which is what B-trees are optimized for; you typically search for many words at a time in order to rank your search results by some function like cosine similarity. You rarely want a single posting; you want the union or intersection of many posting sorted by score.

modulovalue•6mo ago
NIT: That's not quite correct if your first statement is meant to imply an equality rather than a subset relation.

The idea of an index is more general, as an index can be built for many different domains. For example, B-trees can index monoidal data and inverted indexes are just an instance of such a monoid that a B-tree can efficiently index.

Furthermore, metric spaces (e.g., levenshtein distance) can also be efficiently indexed using other trees: metric trees. So calling inverted indexes just indexes would be really confusing since string data is not the only kind of data that a database might want to support having efficient indexes for.

atombender•6mo ago
My point is that all indexes are "inverted" in the sense that they map some searchable value to occurrences of said value. That is true even if method of comparison is not strict equality.
giovannibonetti•6mo ago
Most indexes people hear about are like that. However, there are indexes that work the other way around, like Postgres' Block Range Indexes (BRIN). They are mostly useful as skip indexes - for a given block, they have a summary that tells whether some given data may be there.

The trade-off this kind of index makes is that it is more optimized for (batch) writes than the more popular B-Tree indexes, but it is less optimized for reads on the other hand. If the write throughout of a given table is very high, you might want to remove all B-Tree indexes that are not strongly correlated to the insert order and have BRIN indexes instead. Combine it with table partitioning, and you can add B-Tree indexes in the cold partitions, or even migrate them to columnar storage if available (with the Citus extension).

By the way, a few years ago a Bloom BRIN variant was added, not to be confused with Postgres' Bloom indexes which are something else.

atombender•6mo ago
I wouldn't say BRIN indexes are "the other way around"; index structure is still one where data values are looked up to find the area where occurrences exist.

"Coarse" indexes like BRIN and ClickHouse's data-skipping indexes are still indexes in a broad sense of serving to narrow down a search.

___tom___•6mo ago
This drove me up the wall, until I researched it.

A document can be viewed as an object with a set of pointers to the words it contains.

The inverse of that, was a word object, with a list of pointers to the documents it is found it. This was referred to an an inverted DOCUMENT index. This is what people would normally just call an index.

At some point, people dropped the "DOCUMENT" part, and started just calling it an "inverted index". This makes no sense, grammatically, as it's the document that is inverted, not the index, but it is what it is.

So, an inverted index is just an index.

nzeid•6mo ago
Love this take.
teiferer•6mo ago
Wow, thanks for the explanation! That was driving me nuts too, as I was waiting for the point where they would invert the thing they built and what that would look like, though that point never came. But now I don't need to put in the time that you did!

In summary, they are not "inverted index" in the sense of "the inversion of what you'd normally think of as an index" but instead in the sense of "a map which provides the inversion of the map from documents to words in them, in other words, an index".

dvh•6mo ago
I recently used inverted index (with ranked document retrieval) and it all took only 66 lines of JavaScript: https://github.com/dvhx/ngspicejs/blob/master/js/search.js and I'm kinda proud of that code, it's compact, without dirty tricks or without being overtly smart. Well except for using 1/term_frequency instead of logarithms, it's easier to debug (sums of fractions instead of random numbers produced by logarithms) and I just left it there, it works fine.
AdieuToLogic•6mo ago
A related concept is "Key Word in Context (KWIC)"[0] indicies. These were very common, and useful, in Unix documentation. Another name for them is "permuted index."

0 - https://en.wikipedia.org/wiki/Key_Word_in_Context

jumploops•6mo ago
Tangential, but for anyone looking to utilize Elasticsearch in production, I highly recommend “On-Site Search Design Patterns for E-Commerce”[0]

I was familiar with both inverted indexes and Elasticsearch at a conceptual level, but this overview helped me improve our search (rankings, facets, etc.) much faster than I would have otherwise.

[0]https://project-a.github.io/on-site-search-design-patterns-f...

bob1029•6mo ago
Combining n-grams and bitmap indexes can give you most of the magic in the space.

I've been working on an architecture for code search that relies on a repo-level trigram index and a per-repo FM-index (actual code). The trigram index is used to find the bitmaps of repos that contain each term. These are then ANDed together to produce the final list of repos to search via FM-index.

lazamar•6mo ago
Why not search the FM-indexes directly? It is faster than the n-gram search and you can use the exact full text of the needle.
bob1029•6mo ago
If you have millions of them, searching all every time could become a problem.