frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

We Mourn Our Craft

https://nolanlawson.com/2026/02/07/we-mourn-our-craft/
119•ColinWright•1h ago•87 comments

Speed up responses with fast mode

https://code.claude.com/docs/en/fast-mode
22•surprisetalk•1h ago•24 comments

Hoot: Scheme on WebAssembly

https://www.spritely.institute/hoot/
121•AlexeyBrin•7h ago•24 comments

Stories from 25 Years of Software Development

https://susam.net/twenty-five-years-of-computing.html
62•vinhnx•5h ago•7 comments

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

https://openciv3.org/
828•klaussilveira•21h ago•249 comments

U.S. Jobs Disappear at Fastest January Pace Since Great Recession

https://www.forbes.com/sites/mikestunson/2026/02/05/us-jobs-disappear-at-fastest-january-pace-sin...
119•alephnerd•2h ago•78 comments

Al Lowe on model trains, funny deaths and working with Disney

https://spillhistorie.no/2026/02/06/interview-with-sierra-veteran-al-lowe/
55•thelok•3h ago•7 comments

Brookhaven Lab's RHIC Concludes 25-Year Run with Final Collisions

https://www.hpcwire.com/off-the-wire/brookhaven-labs-rhic-concludes-25-year-run-with-final-collis...
4•gnufx•39m ago•0 comments

The AI boom is causing shortages everywhere else

https://www.washingtonpost.com/technology/2026/02/07/ai-spending-economy-shortages/
108•1vuio0pswjnm7•8h ago•138 comments

The Waymo World Model

https://waymo.com/blog/2026/02/the-waymo-world-model-a-new-frontier-for-autonomous-driving-simula...
1059•xnx•1d ago•611 comments

Reinforcement Learning from Human Feedback

https://rlhfbook.com/
76•onurkanbkrc•6h ago•5 comments

Start all of your commands with a comma (2009)

https://rhodesmill.org/brandon/2009/commands-with-comma/
484•theblazehen•2d ago•175 comments

I Write Games in C (yes, C)

https://jonathanwhiting.com/writing/blog/games_in_c/
8•valyala•2h ago•1 comments

SectorC: A C Compiler in 512 bytes

https://xorvoid.com/sectorc.html
9•valyala•2h ago•0 comments

Vocal Guide – belt sing without killing yourself

https://jesperordrup.github.io/vocal-guide/
209•jesperordrup•12h ago•70 comments

France's homegrown open source online office suite

https://github.com/suitenumerique
558•nar001•6h ago•256 comments

Coding agents have replaced every framework I used

https://blog.alaindichiappari.dev/p/software-engineering-is-back
222•alainrk•6h ago•343 comments

A Fresh Look at IBM 3270 Information Display System

https://www.rs-online.com/designspark/a-fresh-look-at-ibm-3270-information-display-system
36•rbanffy•4d ago•7 comments

Selection Rather Than Prediction

https://voratiq.com/blog/selection-rather-than-prediction/
8•languid-photic•3d ago•1 comments

History and Timeline of the Proco Rat Pedal (2021)

https://web.archive.org/web/20211030011207/https://thejhsshow.com/articles/history-and-timeline-o...
19•brudgers•5d ago•4 comments

72M Points of Interest

https://tech.marksblogg.com/overture-places-pois.html
29•marklit•5d ago•2 comments

Unseen Footage of Atari Battlezone Arcade Cabinet Production

https://arcadeblogger.com/2026/02/02/unseen-footage-of-atari-battlezone-cabinet-production/
114•videotopia•4d ago•31 comments

Where did all the starships go?

https://www.datawrapper.de/blog/science-fiction-decline
76•speckx•4d ago•75 comments

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

https://github.com/Momciloo/fun-with-clip-path
6•momciloo•2h ago•0 comments

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

https://github.com/valdanylchuk/breezydemo
273•isitcontent•22h ago•38 comments

Learning from context is harder than we thought

https://hy.tencent.com/research/100025?langVersion=en
201•limoce•4d ago•111 comments

Show HN: Kappal – CLI to Run Docker Compose YML on Kubernetes for Local Dev

https://github.com/sandys/kappal
22•sandGorgon•2d ago•11 comments

Monty: A minimal, secure Python interpreter written in Rust for use by AI

https://github.com/pydantic/monty
286•dmpetrov•22h ago•153 comments

Making geo joins faster with H3 indexes

https://floedb.ai/blog/how-we-made-geo-joins-400-faster-with-h3-indexes
155•matheusalmeida•2d ago•48 comments

Software factories and the agentic moment

https://factory.strongdm.ai/
71•mellosouls•4h ago•75 comments
Open in hackernews

Counterfactual evaluation for recommendation systems

https://eugeneyan.com/writing/counterfactual-evaluation/
91•kurinikku•3w ago

Comments

ZouBisou•3w ago
I wish there was a HN for data content. Articles like this are always my favorite ways to learn something new from a domain expert. I work in fraud detection & would like to write something similar!
spott•2w ago
https://datatau.net used to be kinda that… but it has been dead for a while, and it looks like the spam bots have taken over.
yearolinuxdsktp•2w ago
I admit this is way over my head, I am still trying to grok it. This seems to require an existing model to start from—-I am not sure how one would arrive at a model from scratch (I guess start from the same weights on all items?)

I think the point about A/B testing in production to confirm if a new model is working is really important, but quite important is to also do A/B/Control testing, where Control is random (seeded to the context or user) or no recommendations, which helps not only with A vs B, but helps validate that A or B isn’t performing worse than Control. What percentage of traffic (1% or 5%) goes to Control depends on traffic levels, but also requires convincing to run control.

I think one important technique is to pre-aggregate your data on a user-centered or item-centered basis. This can make it much more palatable to collect this data on a massive scale without having to store a log for every event.

Contextual bandit is one technique that attempts to deal with confounding factors and bias from actual recommendations. However, I think there’s a major challenge to scale it to large counts of items.

I think the quality of collected non-click data is also important—-did the user actually scroll down to see the recommendations or were they served but not looked at? Likewise, I think it’s important to add depth to the “views” or “clicks” metric—-if something was clicked, how long did the user spend viewing/interacting with the item? Did they click and immediately go back or did they click and look at it for a while? Did they add the item to cart? Or if we are talking about articles, did they spend time reading it? Item interest can be estimated more closely than just views, clicks and purchases. Of course, we know that purchases (or more generally conversion rates) have a direct business value, but, for example, an add to cart is somewhat of a proxy of purchase probability and can enhance the quality of the data used to train (and thus a higher proxy business value).

It’s probably impractical to train on control interactions only (and also difficult to keep the same user in control group between visits).

The SNIPS normalization technique reminds me of the Mutual Information factor correction when training co-occurrence (or association) models, where Mutual Information rewards items less likely to randomly co-occur.

jlamberts•2w ago
Re: existing model, for recsys, as long as the product already exists you have some baseline available, even if it's not very good. Anything from "alphabetical order" to "random order" to "most popular" (a reasonable starting point for a lot of cases) is a baseline model.

I agree that a randomized control is extremely valuable, but more as a way to collect unbiased data than a way to validate that you're outperforming random: it's pretty difficult to do worse than random in most recommendation problems. A more palatable way to introduce some randomness is by showing a random item in a specific position with some probability, rather than showing totally random items for a given user/session. This has the advantage of not ruining the experience for an unlucky user when they get a page of things totally unrelated to their interests.

dfajgljsldkjag•2w ago
It is annoying how often our offline metrics look perfect while the actual A/B test shows zero lift. The article explains that gap really well by framing recommendations as an interventional problem rather than just an observational one. I guess we really need to start looking at counterfactual evaluation if we want our offline tests to actually mean something.
westurner•2w ago
From https://news.ycombinator.com/item?id=46663105 (flagged?) :

> There are a number of different types of counterfactuals; Describe the different types of counterfactuals in statistics: Classical counterfactuals, Pearl's counterfactuals, Quantum counterfactuals, Constructor theory counterfactuals

Why did the author believe that that counterfactual model was appropriate for this?