frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Start all of your commands with a comma

https://rhodesmill.org/brandon/2009/commands-with-comma/
208•theblazehen•2d ago•62 comments

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

https://openciv3.org/
685•klaussilveira•15h ago•204 comments

The Waymo World Model

https://waymo.com/blog/2026/02/the-waymo-world-model-a-new-frontier-for-autonomous-driving-simula...
959•xnx•20h ago•553 comments

Unseen Footage of Atari Battlezone Arcade Cabinet Production

https://arcadeblogger.com/2026/02/02/unseen-footage-of-atari-battlezone-cabinet-production/
65•videotopia•4d ago•3 comments

How we made geo joins 400× faster with H3 indexes

https://floedb.ai/blog/how-we-made-geo-joins-400-faster-with-h3-indexes
126•matheusalmeida•2d ago•35 comments

Jeffrey Snover: "Welcome to the Room"

https://www.jsnover.com/blog/2026/02/01/welcome-to-the-room/
28•kaonwarb•3d ago•23 comments

Vocal Guide – belt sing without killing yourself

https://jesperordrup.github.io/vocal-guide/
44•jesperordrup•5h ago•23 comments

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

https://github.com/valdanylchuk/breezydemo
236•isitcontent•15h ago•26 comments

Monty: A minimal, secure Python interpreter written in Rust for use by AI

https://github.com/pydantic/monty
230•dmpetrov•15h ago•122 comments

Show HN: I spent 4 years building a UI design tool with only the features I use

https://vecti.com
334•vecti•17h ago•146 comments

Where did all the starships go?

https://www.datawrapper.de/blog/science-fiction-decline
26•speckx•3d ago•14 comments

Hackers (1995) Animated Experience

https://hackers-1995.vercel.app/
499•todsacerdoti•23h ago•244 comments

Sheldon Brown's Bicycle Technical Info

https://www.sheldonbrown.com/
384•ostacke•21h ago•97 comments

ga68, the GNU Algol 68 Compiler – FOSDEM 2026 [video]

https://fosdem.org/2026/schedule/event/PEXRTN-ga68-intro/
7•matt_d•3d ago•2 comments

Microsoft open-sources LiteBox, a security-focused library OS

https://github.com/microsoft/litebox
360•aktau•21h ago•183 comments

Show HN: If you lose your memory, how to regain access to your computer?

https://eljojo.github.io/rememory/
295•eljojo•18h ago•186 comments

An Update on Heroku

https://www.heroku.com/blog/an-update-on-heroku/
420•lstoll•21h ago•280 comments

PC Floppy Copy Protection: Vault Prolok

https://martypc.blogspot.com/2024/09/pc-floppy-copy-protection-vault-prolok.html
66•kmm•5d ago•10 comments

Dark Alley Mathematics

https://blog.szczepan.org/blog/three-points/
95•quibono•4d ago•22 comments

Was Benoit Mandelbrot a hedgehog or a fox?

https://arxiv.org/abs/2602.01122
21•bikenaga•3d ago•11 comments

How to effectively write quality code with AI

https://heidenstedt.org/posts/2026/how-to-effectively-write-quality-code-with-ai/
262•i5heu•18h ago•210 comments

Delimited Continuations vs. Lwt for Threads

https://mirageos.org/blog/delimcc-vs-lwt
33•romes•4d ago•3 comments

Female Asian Elephant Calf Born at the Smithsonian National Zoo

https://www.si.edu/newsdesk/releases/female-asian-elephant-calf-born-smithsonians-national-zoo-an...
38•gmays•10h ago•13 comments

Introducing the Developer Knowledge API and MCP Server

https://developers.googleblog.com/introducing-the-developer-knowledge-api-and-mcp-server/
61•gfortaine•12h ago•26 comments

I now assume that all ads on Apple news are scams

https://kirkville.com/i-now-assume-that-all-ads-on-apple-news-are-scams/
1074•cdrnsf•1d ago•460 comments

Understanding Neural Network, Visually

https://visualrambling.space/neural-network/
294•surprisetalk•3d ago•44 comments

I spent 5 years in DevOps – Solutions engineering gave me what I was missing

https://infisical.com/blog/devops-to-solutions-engineering
152•vmatsiiako•20h ago•72 comments

The AI boom is causing shortages everywhere else

https://www.washingtonpost.com/technology/2026/02/07/ai-spending-economy-shortages/
13•1vuio0pswjnm7•1h ago•0 comments

Why I Joined OpenAI

https://www.brendangregg.com/blog/2026-02-07/why-i-joined-openai.html
158•SerCe•11h ago•144 comments

Learning from context is harder than we thought

https://hy.tencent.com/research/100025?langVersion=en
187•limoce•3d ago•103 comments
Open in hackernews

A simple search engine from scratch

https://bernsteinbear.com/blog/simple-search/
296•bertman•8mo ago

Comments

franczesko•8mo ago
On the topic of search engines, I really liked classes by David Evans. The task was also building a simple search engine from scratch. It's really for beginners, as the emphasis is on coding in general, but I've found it to be very approachable.

https://www.cs.virginia.edu/~evans/courses/

marginalia_nu•8mo ago
The SeIRP-book, free online as a PDF, is also a fantastic resource on traditional search engines and information retrieval in general.

[1] https://ciir.cs.umass.edu/irbook/

franczesko•8mo ago
Due to dead links, this is more appropriate url:

https://www.cs.virginia.edu/~evans/courses/cs101/

StefanBatory•8mo ago
Server not found. Did HN gave it hug of death?
franczesko•8mo ago
Please see links to videos and notes - they still work. Udacity must have removed the course
fuzztester•8mo ago
the actual course link on udacity gives a 404.
ktallett•8mo ago
I always wonder if the days of search engines for specific topics could return. With LLM's providing less than accurate results in some areas, and Google, bing, etc being taken over by adverts or well organised SEO, there feels like a place for accurate, specialised search.
datadrivenangel•8mo ago
The curation of an index of resources is what's needed for niche search
dcist•8mo ago
WestLaw and Lexis Nexis provide this for legal search, but quite frankly, these services are subpar. It's amazing that these two companies rake in hundreds of millions but they are both slower than Google, Bing, Yandex, or any LLM service (ChatGPT, Claude, Gemini, etc.) while scouring a universe of text that is orders of magnitude smaller. The user experience is also terrible (you have to login and specify a client each and every time you attempt to use the service and both services log you out after a short -- in my opinion -- period of inactivity, creating friction and needless annoyance to the user). There's an opportunity there.
ktallett•8mo ago
I haven't personally used the mentioned services as they aren't in my field, however what is the accuracy of their results? Are they double checked? I don't find LLMs particularly accurate in my field (that's being kind), if anything I find they make up sources that simply don't exist.

I mean poor UX has no excuse but slow speed can be reasoned if it makes the quality of the service better.

ordersofmag•8mo ago
Here’s a place to start if you want to go down the rabbit hole of how search at places like this is approached. https://haystackconf.com/us2022/talk-12/

https://www.youtube.com/watch?v=9vCMFIJRiKk

ahi•8mo ago
LN and Westlaw's real service is their ubiquity. Every law student has access to it and every firm expects proficiency. While they generally suck, the last time I used it (looong time ago), their boolean search was quite nice. That kind of text search has mostly been replaced by non-deterministic black boxes which aren't great for legal research.
piker•8mo ago
You forgot to mention their claim of copyright over the bulk of, e.g. obscure state case law.
ehecatl42•8mo ago
So, you have to pay to access the law that you are subject to?
piker•8mo ago
If you want it digitized, yes, odd as that seems. You can go find individual prints of it or perhaps digital copies of opinions elsewhere, but those are also technically copyrighted in a lot of cases too.
jfil•8mo ago
In some jurisdictions, like Ontario, there are secret agreements that only allow 3 organizations to have digital access to Case Law (https://www.cameronhuff.com/blog/ontario-case-law-private/). This says a lot about our society, and how much we still have to improve.
throwup238•8mo ago
They've also got the Microsoft effect going on. Usually at least one of their products like their personal information aggregator used for locating people (like when serving lawsuits) is mandatory for a firm so it's just easier for them bundle everything else in.
raydenvm•8mo ago
Which is not scalable, right?
cosmicgadget•8mo ago
It's scalable if you are okay with not searching exhaustively.
cosmicgadget•8mo ago
My hope is that content self-indexes so instead curation it just has to be aggregated.
econ•8mo ago
Depends how tiny the niche is. A few dozen domains is easily done by hand and worth having.
cyanydeez•8mo ago
i know the answer is never distributed services, but if one could build a sufficiently complex SDK to make like a Blue Sky but for niche search indexes, you could chain a bunch of vetted resources together.
fanwood•8mo ago
I already directly search on Wikipedia for most topics (with a search shortcut on URL bar)
ktallett•8mo ago
Wikipedia is useful up to a point for sure. I feel whether it could be a expansion of Wikipedia in it's current use case, but for emerging research and niche topics it can sometimes be less useful.
wolfgang42•8mo ago
Yeah, the (relative) rise of Kagi and Marginalia show that from a technical perspective, this is within the grasp of a dedicated hobbyist.[1] If Google continues their current trajectory, and overwhelming numbers of AI crawlers don’t cause an unsurmountable rise in CAPTCHA pages, I hope to see an upsurgence of niche search engines that focus on some specialty small enough that one or a few people can curate the content and produce a much better experience than the current crop of general Web search engines.

Self-plug: I run such a search engine (for programmers) in my living room, at <https://search.feep.dev/>. I don’t spend a ton of time maintaining it, so I’m interested to see what someone really dedicated could do.

[1] I wrote a 2004-vs-2014 comparison, and things have only gotten better since then: https://search.feep.dev/blog/post/2022-07-23-write-your-own

iLoveOncall•8mo ago
Please, Kagi doesn't even have 50,000 active members, it's definitely not "rising" to become a serious contender at any sort of market share, it's a micro-project. You just feel it's bigger than that because for some reason all of its 50,000 users post relentlessly about it on HN.
wolfgang42•8mo ago
Hence the (relative), yes. Did “dedicated hobbyist” not tip you off that I wasn’t thinking about how to maximize market share?
cyanydeez•8mo ago
Just gotta build a search engine that properly contextualizes scams, bait & switch sites, SEO, and the rest, and you're back in business.

To do that, you probably still need humans to properly curate the dataset, essentially hire 100 librarians and setup a work flow for them to continually prune results.

Right now, everything is all batch processes. None of these LLMs use active feedback since there's no real models using updates.

sp0rk•8mo ago
The SVG equation is very difficult to read if you're using a dark OS theme because the blog uses the OS preference for dark/light theme (and doesn't seem to give an option to change it manually, either.)
tekknolagi•8mo ago
Fixed, I think? Let me know
DylanSp•8mo ago
Works now (I noticed the same issue).
dheera•8mo ago
On the side, not criticizing OP but I hate the word "cosine similarity" and I wish people would just call it a "normalized dot product" because anyone who took sophomore-level university calculus would get it, but instead we all invented another word
cosmicgadget•8mo ago
This was a really nice read. Now I have no excuse not to upgrade my blog search. I do feel that I'll have a ton of long tail words like 'prank'.
snowstormsun•8mo ago
Nice idea, but this approach does not handle out of vocabulary words well which is one major motivation for using a vector-based search. It might not perform significantly better compared to lexical matching like tf-idf or BM25, and being slower because of linear complexity. But cool regardless.
netdevphoenix•8mo ago
It is supposed to be a simple search engine. Keyword: simple.

As long as it does what it is meant to, as a simple search engine, it seems fine

snowstormsun•8mo ago
Using tfidf or bm25 would actually be simpler than a vector search.

I understand this is just for fun, just wanted to point that out.

LunaSea•8mo ago
TF/IDF does not support out-of-vocabulary keywords as far as I know.
haasisnoah•8mo ago
How would you handle those in wordvec?

And isn’t a big advantage that synonyms are handled correctly. This implementation still has that advantage.

cosmicgadget•8mo ago
Or since OP has both the cosine similarity matching and naive matching, a heuristic combination of the two since they address each other's weaknesses.
janalsncm•8mo ago
Vector based approaches either don’t handle OOV terms at all or will perform poorly, depending on implementation. If you limit to alphanumeric trigrams for example you can technically cover all terms but badly depending on training data.
swyx•8mo ago
this embeds words with word2vec, which is like 10 years old. at least use BERT or sentencetransformers :)
gthompson512•8mo ago
I have been thinking a bit lately about how much sense that makes compared to just using word vectors, since traditional queries are super short and often keyword based(like searching for "ground beef" when wanting "ground beef recipes I can cook easily tonight") and so lack most of the context that BERT or similar gives you. I know there are methods like using seperate embeddings for queries and such, but maybe a basic word based search could be more useful, especially with something like fastText for out of vocabulary terms.
kaycebasques•8mo ago
> The idea behind the search engine is to embed each of my posts into this domain by adding up the embeddings for the words in the post.

Ah, OK! I never really grokked how to use word-level embeddings. Makes more sense now.

skarz•8mo ago
Is 'grokked' a common verb now? I had never even heard the word until Musk's AI.
StefanBatory•8mo ago
It was a word before, as far as I remember. Saw it a few times here.
skarz•8mo ago
What does it even mean?
russfink•8mo ago
To understand and comprehend something in fullness. To reach the depths of the concept, idea, or entity so deep that you are practically one with it. (This is per my recollection of the Heinlein story, where grokking one in fullness was the highest form of respect.)
kaycebasques•8mo ago
A common verb "now"??

> Grok (/ˈɡrɒk/) is a neologism coined by the American writer Robert A. Heinlein for his 1961 science fiction novel Stranger in a Strange Land. While the Oxford English Dictionary summarizes the meaning of grok as "to understand intuitively or by empathy, to establish rapport with" and "to empathize or communicate sympathetically (with); also, to experience enjoyment",[1] Heinlein's concept is far more nuanced, with critic Istvan Csicsery-Ronay Jr. observing that "the book's major theme can be seen as an extended definition of the term."[2] The concept of grok garnered significant critical scrutiny in the years after the book's initial publication. The term and aspects of the underlying concept have become part of communities such as computer science.

https://en.wikipedia.org/wiki/Grok

skarz•8mo ago
Yes, "now". According to Google Trends[0] there was little to no search interest in the term until December 2023.

Usage of 'grokked' on HN: 1,147[1]

Usage of 'hacked' on HN: 37,272[2]

[0] https://trends.google.com/trends/explore?date=all&geo=US&q=g...

[1] https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu...

[2] https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu...

johnisgood•8mo ago
I do not think "hacked" is a good comparison, does not "to grok [smth]" mean "to understand [smth]"?
plumeria•8mo ago
Yes, it means to understand. First book from Manning that uses this verb is from 2016 [1]

[1] https://www.manning.com/books/grokking-algorithms

andrehacker•8mo ago
Compare this with Google trends for grokKED (to compare with the hacker news trend):

Google trend for grokKED: https://trends.google.com/trends/explore?date=all&geo=US&q=g...

kevinsync•8mo ago
I never knew the etymology [0] but always knew the word for as long as I've been into computing (90's) .. apparently it's from the 1960's from a Heinlein novel!

[0] - https://en.wikipedia.org/wiki/Grok

BalinKing•8mo ago
I first learned of from the Jargon File, long before Grok the product existed: https://www.catb.org/jargon/html/G/grok.html
janalsncm•8mo ago
Started hearing about it in ~2022 when some ML researchers accidentally left a model training on over a weekend. For a while the model wasn’t doing much (so they were going to turn it off) and then over the weekend it got surprisingly good.

https://en.m.wikipedia.org/wiki/Grokking_(machine_learning)

vojtechrichter•8mo ago
I really like people playing around with technology many take for granted, without understanding its core, underlying princliples
leumassuehtam•8mo ago
The author has a nice series on compiling a Lisp [0], but unfortunately his search engine fails to find it by querying it with "lisp" or "Lisp".

[0] https://bernsteinbear.com/blog/compiling-a-lisp-0/

tekknolagi•8mo ago
I wonder if that's just not in the top 10k words :/
freilanzer•8mo ago
And it's unfinished since 2020.