frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

https://openciv3.org/
604•klaussilveira•11h ago•180 comments

The Waymo World Model

https://waymo.com/blog/2026/02/the-waymo-world-model-a-new-frontier-for-autonomous-driving-simula...
912•xnx•17h ago•545 comments

What Is Ruliology?

https://writings.stephenwolfram.com/2026/01/what-is-ruliology/
28•helloplanets•4d ago•21 comments

How we made geo joins 400× faster with H3 indexes

https://floedb.ai/blog/how-we-made-geo-joins-400-faster-with-h3-indexes
100•matheusalmeida•1d ago•24 comments

Unseen Footage of Atari Battlezone Arcade Cabinet Production

https://arcadeblogger.com/2026/02/02/unseen-footage-of-atari-battlezone-cabinet-production/
29•videotopia•4d ago•1 comments

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

https://github.com/valdanylchuk/breezydemo
207•isitcontent•12h ago•24 comments

Monty: A minimal, secure Python interpreter written in Rust for use by AI

https://github.com/pydantic/monty
206•dmpetrov•12h ago•98 comments

Show HN: I spent 4 years building a UI design tool with only the features I use

https://vecti.com
315•vecti•14h ago•138 comments

Microsoft open-sources LiteBox, a security-focused library OS

https://github.com/microsoft/litebox
354•aktau•18h ago•180 comments

Sheldon Brown's Bicycle Technical Info

https://www.sheldonbrown.com/
360•ostacke•18h ago•94 comments

Hackers (1995) Animated Experience

https://hackers-1995.vercel.app/
465•todsacerdoti•19h ago•232 comments

Jeffrey Snover: "Welcome to the Room"

https://www.jsnover.com/blog/2026/02/01/welcome-to-the-room/
4•kaonwarb•3d ago•1 comments

Delimited Continuations vs. Lwt for Threads

https://mirageos.org/blog/delimcc-vs-lwt
24•romes•4d ago•3 comments

Show HN: If you lose your memory, how to regain access to your computer?

https://eljojo.github.io/rememory/
262•eljojo•14h ago•156 comments

An Update on Heroku

https://www.heroku.com/blog/an-update-on-heroku/
398•lstoll•18h ago•271 comments

Dark Alley Mathematics

https://blog.szczepan.org/blog/three-points/
80•quibono•4d ago•20 comments

PC Floppy Copy Protection: Vault Prolok

https://martypc.blogspot.com/2024/09/pc-floppy-copy-protection-vault-prolok.html
54•kmm•4d ago•3 comments

Was Benoit Mandelbrot a hedgehog or a fox?

https://arxiv.org/abs/2602.01122
8•bikenaga•3d ago•2 comments

How to effectively write quality code with AI

https://heidenstedt.org/posts/2026/how-to-effectively-write-quality-code-with-ai/
238•i5heu•14h ago•181 comments

Introducing the Developer Knowledge API and MCP Server

https://developers.googleblog.com/introducing-the-developer-knowledge-api-and-mcp-server/
49•gfortaine•9h ago•15 comments

I spent 5 years in DevOps – Solutions engineering gave me what I was missing

https://infisical.com/blog/devops-to-solutions-engineering
138•vmatsiiako•17h ago•60 comments

Understanding Neural Network, Visually

https://visualrambling.space/neural-network/
273•surprisetalk•3d ago•37 comments

Why I Joined OpenAI

https://www.brendangregg.com/blog/2026-02-07/why-i-joined-openai.html
126•SerCe•8h ago•107 comments

Female Asian Elephant Calf Born at the Smithsonian National Zoo

https://www.si.edu/newsdesk/releases/female-asian-elephant-calf-born-smithsonians-national-zoo-an...
28•gmays•7h ago•9 comments

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

https://github.com/phreda4/r3
68•phreda4•11h ago•13 comments

Vocal Guide – belt sing without killing yourself

https://jesperordrup.github.io/vocal-guide/
7•jesperordrup•2h ago•1 comments

I now assume that all ads on Apple news are scams

https://kirkville.com/i-now-assume-that-all-ads-on-apple-news-are-scams/
1051•cdrnsf•21h ago•432 comments

FORTH? Really!?

https://rescrv.net/w/2026/02/06/associative
61•rescrv•19h ago•22 comments

Learning from context is harder than we thought

https://hy.tencent.com/research/100025?langVersion=en
171•limoce•3d ago•93 comments

Zlob.h 100% POSIX and glibc compatible globbing lib that is faste and better

https://github.com/dmtrKovalenko/zlob
15•neogoose•4h ago•9 comments
Open in hackernews

Achieving 10,000x training data reduction with high-fidelity labels

https://research.google/blog/achieving-10000x-training-data-reduction-with-high-fidelity-labels/
154•badmonster•6mo ago

Comments

ericyd•6mo ago
> in production traffic only very few (<1%) ads are actually clickbait

That's a fascinating claim, and it does not align with my anecdotal experience using the web for many years.

ajb•6mo ago
I had that reaction as well, but consider: clickbait is such because it takes more work (emotional or logical) to reject it than an ad which is merely not relevant to you. Thus, your (and my) recall of ads is probably biased towards clickbait, and we overestimate its prevalence.
vajrabum•6mo ago
Not quite the same thing but some non-negligable percentage of ads I see on Facebook are outright scams which purport to be selling musical instruments at a 'markdown'. First guitars supposedly from the Sam Ash bankruptcy sales linking to an obvious fake site and more lately 'free' giveaways of high end Gibson acoustic guitars. When I've reported them I got the feedback that it didn't violate community standards, but my insta account got perma-banned when I posted the original of a song on youtube from 1928 on a thread which started with a cover from 30 years ago. That was considered spam.
galaxyLogic•6mo ago
Smart scammers should know that peopel know if something is too good to be true ("free Gibson} etc), it is probabaly fake. But people keep clicking, for what it's worth.
adgjlsfhk1•6mo ago
it's the opposite. scammers want the people that are gullible enough to go for "free"
throwaway1004•6mo ago
This is a narrative I've heard many times, with very little evidence to back it up. An alternative and more accurate view is that, as the world came online, people became exposed to the very low-effort scams, representative of criminal elements from around the world, which befuddled most due to their child-like naivety. None of those confused individuals would fall for it but they require an explanation. Someone came up with a theory that it's actually a stroke of 4D genius and it stuck.

edit: ok, I bothered to look this up: Microsoft had a guy do a study on nigerian scams, the guys who wrote Freakonomics did a sequel referencing that study and drew absurb unfounded conclusions, which have been repeated over and over. Business as usual for the fig-leaf salesmen.

andrewmcwatters•6mo ago
Ad company says ads are good, water is wet, news at 11.
vFunct•6mo ago
That usually means you tend to visit trash sites. Higher quality sites have higher quality ads. In fact, for the highest quality media, people actually PAY for ads. See things like Vogue September issue or technical shopping magazines, which earn value for being 90% ads. People used to buy local newspapers because of the ads as well.
andrewflnr•6mo ago
Specifically the September issue? Is that one special?
vFunct•6mo ago
Yes, as Fall/Winter clothing is sold starting around September, and fall/winter apparel are generally the more expensive than spring/summer clothes, and so more advertising dollars go into it.
ericyd•6mo ago
Lol I always love a good "slap in the face" reply
aaron695•6mo ago
> it does not align with my anecdotal experience

Given I'll often see the same fraudulent ad repeated I think anecdotal experience is there are not many of them.

I can even talk to friends about the most boring fraudulent ads and they know them. i.e. Elon doubling your bitcoin scams.

For normal ads unless they are viral, there are millions out there that are never repeated or not even seen.

Because fraud ads have short lifetimes pulled out of 'production traffic' you can collect many for the training data

I assume 'clickbait' is the safety word for 'fraud'

woolion•6mo ago
In the last 6 months, I've had to buy a few things that 'normal people' tend to buy (a coffee machine, fuel, ...), for which we didn't already have trusted sellers, and so checked Google.

For fuel, Google results were 90% scams, for coffee machines closer to 75% The scams are fairly elaborate: they clone some legitimate looking sites, then offer prices that are very competitive -- between 50% and 75% of market prices -- that put them on top of SEO. It's only by looking in details at contact information that there are some things that look off (one common thing is that they may encourage bank transfers since there's no buyer protection there, but it's not always the case).

A 75% market rate is not crazy "too good to be true" thing, it's in the realm of what a legitimate business can do, and with the prices of the items being in the 1000s, that means any hooked victim is a good catch. A particular example was a website copying the one for a massive discount appliance store chain in the Netherlands. They had a close domain name, even though the website looked different, so any Google search linked it towards the legitimate business.

You really have to apply a high level of scrutiny, or understand that Google is basically a scam registry.

jacquesm•6mo ago
Scammers can outbid real stores on the same products for the advertising space simply because they have much better margins. And google really doesn't care about whether it is a scammer that pays them or a legit business, they do zero due diligence on the targets of the advertising.
NooneAtAll3•6mo ago
didn't parent comment cited sentence about clickbait?

why did you change subject to scams?

woolion•6mo ago
Parent says it's an outlandish claim that they can reliably tell whether ads are clickbait.

I believe that detecting whether an ad is clickbait is a similar problem -- not exactly the same, but it suffers from the same issues:

- it's not well defined at all.

- any heuristic is constantly gamed by bad actors

- it requires a deeper, contextual analysis of the content that is served

- content analysis requires a notion of what is reputable or reasonable

If I take an LLM's definition of "clickbait", I get "sensationalized, misleading, or exaggerated headlines"; so scams would be a subset of it (it is misleading content that you need to click through). They do not provide their definition though.

So you have Google products (both the Products search and the general search) that recommend scams with an incredible rate, where the stakes are much higher. Is it reasonable that they're able to solve the general problem? How can anyone verify such a claim, or trust it?

wazoox•6mo ago
In my (fortunately uncommon) experience, all ads served by Google are clickbait, or even blatant scam (like fake interviews of celebrities that explain how they earn lots of money without working, magical health enhancers, etc).
trhway•6mo ago
Reminds how one of the winners of the 2001 Andrew Ng’s Data-Centric AI competition analyzed embeddings separation to choose training data https://rensdimmendaal.com/posts/data-centric-ai
abhgh•6mo ago
Active Learning is a very tricky area to get right ... over the years I have had mixed luck with text classification, to the point that my colleague and I decided to perform a thorough empirical study [1], that normalized various experiment settings that individual papers had reported. We observed that post normalization, randomly picking instances to label is better!

[1] https://aclanthology.org/2024.emnlp-main.1240/

marcyb5st•6mo ago
We successfully built an AL pipeline by building models that quantify both aleatoric and epistemic uncertainties and use those quantities to drive our labeling efforts.

Specifically, post training you measure those on an holdout set and then you slice the results based on features. While these models tend to be more complex and potentially less understandable we feel the pros out-weight the cons.

Additionally, giving access to a confidence score to your end users is really useful to have them trust the predictions and in case that there is a non-0 cost for acting due to false positives/negatives you can try to come up with a strategy that minimize the expected costs.

scribu•6mo ago
I’m confused by the clustering step:

> To find the most informative examples, we separately cluster examples labeled clickbait and examples labeled benign, which yields some overlapping clusters

How can you get overlapping clusters if the two sets of labelled examples are disjoint?

cm228•6mo ago
they cluster the examples with their model and then check the predictions against the labels.
patresh•6mo ago
If the diagram is representative of what is happening, it would seem that each cluster is represented as a hypersphere, possibly using the cluster centroid and max distance from the centroid to any cluster member as radius. Those hyperspheres can then overlap. Not sure if that is what is actually happening though.
fumeux_fume•6mo ago
The information you're seeking appears to be left out of the post. My best guess is that a separate embedding model, specifically tuned for document similarly, is used to generate the vectors and then a clustering algorithm is chosen to create the clusters. They may also use PCA to reduce the embedded vector dimensions before clustering.
overfeed•6mo ago
> How can you get overlapping clusters if the two sets of labelled examples are disjoint?

What's disjoint are the training labels and the classifier's output - not the values in high-dimension space. For classification tasks, there can be neighboring items in the same cluster but separated by the hyperplane - and therefore placed in different classes despite the proximity.

patresh•6mo ago
What is the clustering performed on? Is another embedding model used to produce the embeddings or do they come from the LLM?

Typically LLMs don't produce usable embeddings for clustering or retrieval and embedding models trained with contrastive learning are used instead, but there seems to be no mention of any other models than LLMs.

I'm also curious about what type of clustering is used here.

ghm2180•6mo ago
Is it just me or does the showing of hyperspheres deliberately meant to obfuscate some kind of a trade secrets of how to select examples for human to send to a human?

The obfuscation being use of a support vector machines which are the goto for selecting the Support vectors and ignoring the outliers and distance being defined between embedding vectors.

I could be wrong they could be using something different for clustering or fancier like a variant of DBScan.

unixhero•6mo ago
Why were high fidelity labels not used from the start?
algorithmsRcool•6mo ago
The methodology helps focus the attention of human labelers on the controversial/borderline cases, making them more effective in how they spend thier time.