frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Tesla turbine-inspired structure generates electricity using compressed air

https://techxplore.com/news/2026-01-tesla-turbine-generates-electricity-compressed.html
1•PaulHoule•1m ago•0 comments

State Department deleting 17 years of tweets (2009-2025); preservation needed

https://www.npr.org/2026/02/07/nx-s1-5704785/state-department-trump-posts-x
1•sleazylice•1m ago•1 comments

Learning to code, or building side projects with AI help, this one's for you

https://codeslick.dev/learn
1•vitorlourenco•1m ago•0 comments

Effulgence RPG Engine [video]

https://www.youtube.com/watch?v=xFQOUe9S7dU
1•msuniverse2026•3m ago•0 comments

Five disciplines discovered the same math independently – none of them knew

https://freethemath.org
1•energyscholar•3m ago•1 comments

We Scanned an AI Assistant for Security Issues: 12,465 Vulnerabilities

https://codeslick.dev/blog/openclaw-security-audit
1•vitorlourenco•4m ago•0 comments

Amazon no longer defend cloud customers against video patent infringement claims

https://ipfray.com/amazon-no-longer-defends-cloud-customers-against-video-patent-infringement-cla...
1•ffworld•5m ago•0 comments

Show HN: Medinilla – an OCPP compliant .NET back end (partially done)

https://github.com/eliodecolli/Medinilla
2•rhcm•8m ago•0 comments

How Does AI Distribute the Pie? Large Language Models and the Ultimatum Game

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6157066
1•dkga•8m ago•1 comments

Resistance Infrastructure

https://www.profgalloway.com/resistance-infrastructure/
2•samizdis•13m ago•0 comments

Fire-juggling unicyclist caught performing on crossing

https://news.sky.com/story/fire-juggling-unicyclist-caught-performing-on-crossing-13504459
1•austinallegro•13m ago•0 comments

Restoring a lost 1981 Unix roguelike (protoHack) and preserving Hack 1.0.3

https://github.com/Critlist/protoHack
2•Critlist•15m ago•0 comments

GPS and Time Dilation – Special and General Relativity

https://philosophersview.com/gps-and-time-dilation/
1•mistyvales•18m ago•0 comments

Show HN: Witnessd – Prove human authorship via hardware-bound jitter seals

https://github.com/writerslogic/witnessd
1•davidcondrey•18m ago•1 comments

Show HN: I built a clawdbot that texts like your crush

https://14.israelfirew.co
2•IsruAlpha•20m ago•2 comments

Scientists reverse Alzheimer's in mice and restore memory (2025)

https://www.sciencedaily.com/releases/2025/12/251224032354.htm
1•walterbell•23m ago•0 comments

Compiling Prolog to Forth [pdf]

https://vfxforth.com/flag/jfar/vol4/no4/article4.pdf
1•todsacerdoti•25m ago•0 comments

Show HN: Cymatica – an experimental, meditative audiovisual app

https://apps.apple.com/us/app/cymatica-sounds-visualizer/id6748863721
1•_august•26m ago•0 comments

GitBlack: Tracing America's Foundation

https://gitblack.vercel.app/
3•martialg•26m ago•0 comments

Horizon-LM: A RAM-Centric Architecture for LLM Training

https://arxiv.org/abs/2602.04816
1•chrsw•26m ago•0 comments

We just ordered shawarma and fries from Cursor [video]

https://www.youtube.com/shorts/WALQOiugbWc
1•jeffreyjin•27m ago•1 comments

Correctio

https://rhetoric.byu.edu/Figures/C/correctio.htm
1•grantpitt•27m ago•0 comments

Trying to make an Automated Ecologist: A first pass through the Biotime dataset

https://chillphysicsenjoyer.substack.com/p/trying-to-make-an-automated-ecologist
1•crescit_eundo•31m ago•0 comments

Watch Ukraine's Minigun-Firing, Drone-Hunting Turboprop in Action

https://www.twz.com/air/watch-ukraines-minigun-firing-drone-hunting-turboprop-in-action
1•breve•32m ago•0 comments

Free Trial: AI Interviewer

https://ai-interviewer.nuvoice.ai/
1•sijain2•32m ago•0 comments

FDA intends to take action against non-FDA-approved GLP-1 drugs

https://www.fda.gov/news-events/press-announcements/fda-intends-take-action-against-non-fda-appro...
23•randycupertino•34m ago•15 comments

Supernote e-ink devices for writing like paper

https://supernote.eu/choose-your-product/
3•janandonly•36m ago•0 comments

We are QA Engineers now

https://serce.me/posts/2026-02-05-we-are-qa-engineers-now
1•SerCe•37m ago•0 comments

Show HN: Measuring how AI agent teams improve issue resolution on SWE-Verified

https://arxiv.org/abs/2602.01465
2•NBenkovich•37m ago•0 comments

Adversarial Reasoning: Multiagent World Models for Closing the Simulation Gap

https://www.latent.space/p/adversarial-reasoning
1•swyx•37m ago•0 comments
Open in hackernews

Counterfactual evaluation for recommendation systems

https://eugeneyan.com/writing/counterfactual-evaluation/
91•kurinikku•3w ago

Comments

ZouBisou•3w ago
I wish there was a HN for data content. Articles like this are always my favorite ways to learn something new from a domain expert. I work in fraud detection & would like to write something similar!
spott•3w ago
https://datatau.net used to be kinda that… but it has been dead for a while, and it looks like the spam bots have taken over.
yearolinuxdsktp•2w ago
I admit this is way over my head, I am still trying to grok it. This seems to require an existing model to start from—-I am not sure how one would arrive at a model from scratch (I guess start from the same weights on all items?)

I think the point about A/B testing in production to confirm if a new model is working is really important, but quite important is to also do A/B/Control testing, where Control is random (seeded to the context or user) or no recommendations, which helps not only with A vs B, but helps validate that A or B isn’t performing worse than Control. What percentage of traffic (1% or 5%) goes to Control depends on traffic levels, but also requires convincing to run control.

I think one important technique is to pre-aggregate your data on a user-centered or item-centered basis. This can make it much more palatable to collect this data on a massive scale without having to store a log for every event.

Contextual bandit is one technique that attempts to deal with confounding factors and bias from actual recommendations. However, I think there’s a major challenge to scale it to large counts of items.

I think the quality of collected non-click data is also important—-did the user actually scroll down to see the recommendations or were they served but not looked at? Likewise, I think it’s important to add depth to the “views” or “clicks” metric—-if something was clicked, how long did the user spend viewing/interacting with the item? Did they click and immediately go back or did they click and look at it for a while? Did they add the item to cart? Or if we are talking about articles, did they spend time reading it? Item interest can be estimated more closely than just views, clicks and purchases. Of course, we know that purchases (or more generally conversion rates) have a direct business value, but, for example, an add to cart is somewhat of a proxy of purchase probability and can enhance the quality of the data used to train (and thus a higher proxy business value).

It’s probably impractical to train on control interactions only (and also difficult to keep the same user in control group between visits).

The SNIPS normalization technique reminds me of the Mutual Information factor correction when training co-occurrence (or association) models, where Mutual Information rewards items less likely to randomly co-occur.

jlamberts•2w ago
Re: existing model, for recsys, as long as the product already exists you have some baseline available, even if it's not very good. Anything from "alphabetical order" to "random order" to "most popular" (a reasonable starting point for a lot of cases) is a baseline model.

I agree that a randomized control is extremely valuable, but more as a way to collect unbiased data than a way to validate that you're outperforming random: it's pretty difficult to do worse than random in most recommendation problems. A more palatable way to introduce some randomness is by showing a random item in a specific position with some probability, rather than showing totally random items for a given user/session. This has the advantage of not ruining the experience for an unlucky user when they get a page of things totally unrelated to their interests.

dfajgljsldkjag•2w ago
It is annoying how often our offline metrics look perfect while the actual A/B test shows zero lift. The article explains that gap really well by framing recommendations as an interventional problem rather than just an observational one. I guess we really need to start looking at counterfactual evaluation if we want our offline tests to actually mean something.
westurner•2w ago
From https://news.ycombinator.com/item?id=46663105 (flagged?) :

> There are a number of different types of counterfactuals; Describe the different types of counterfactuals in statistics: Classical counterfactuals, Pearl's counterfactuals, Quantum counterfactuals, Constructor theory counterfactuals

Why did the author believe that that counterfactual model was appropriate for this?